Health Care

About the Focus Area

Data science is a driver of health care transformation.

From the development of new technologies and devices to targeted interventions to improve health outcomes, data-based solutions are reshaping the health care landscape and improving the efficiency and effectiveness of medicine. Researchers and practitioners apply data science principles and techniques to better understand health processes and transform health care delivery for better diagnoses, better care, and better cures.

Our Health Analytics Center, which is located at the Columbia University Irving Medical Center, facilitates collaborations between researchers from medicine, biology, public health, informatics, computer science, applied mathematics, and statistics. These thought leaders combine techniques from the growing field of data science with subject-matter expertise from their respective disciplines to improve the health of individuals and health care systems.

For example, precision medicine aims to find the right drug for the right patient at the right moment. Such accuracy is especially crucial in cancer treatments; exact causes vary between different tumors, and no two tumors have the same set of alterations. DSI-affiliated researchers from statistics, biomedical informatics, and cell biology are mapping a comprehensive set of causes to model, predict, and target therapeutic sensitivity and resistance of cancer for better treatment. Other experts in biomedical informatics, biomedical engineering, radiology, urology, and pathology have collaborated to use data science with magnetic-resonance imaging physics to improve prostate cancer diagnosis and staging. Yet another team sought to understand tumor microbiology and determined that bacteria in pancreatic tumors actually degrade a popular chemotherapy drug.

Data is also considered an organizational asset and plays an increasingly vital role in clinical administrative decision-making. Virtually every hospital and clinic collects detailed medical records about its patients, but hospitals are wary of sharing data with other institutions due to privacy concerns. Researchers in engineering, computer science, and biomedical informatics are building an infrastructure for sharing machine learning models of large-scale, clinical datasets to rapidly advance innovation in clinical data research while safeguarding patients’ privacy.

DSI graduate students also collaborate with faculty to complete capstone projects and apply data science techniques to real challenges for the health professions. A recent student team used deep learning methods to build a model to predict whether high-resolution images of lung tissue showed evidence of pulmonary fibrosis. Another group partnered with New York City’s Department of Health and Mental Hygiene to scrape public social media posts from Twitter and Reddit and build a web app in R Shiny to flag posts associated with depression. By determining which genes influence biofilm structures, our M.S. students have also suggested ways to help make conventional antibiotics more effective and helped develop tools hospitals may use to diagnose bacterial conditions and treat infections.

Related Centers

Health Analytics

Cybersecurity

Smart Cities

Materials Discovery Analytics

Research Highlights

OHDSI, pronounced “odyssey”, is a multi-stakeholder, interdisciplinary collaborative to bring out the value of health data through large-scale analytics. All of its solutions are open-source. OHDSI has established observational health databases with a Columbia-based central coordinating center and an international network of researchers, including DSI members David Blei, Noemie Elhadad, George Hripcsak, David Madigan, Karthik Natarajan, Nicholas Tatonetti, Ying Wei, and Chunhua Weng.
This project develops methods for temporal analysis of gut microbiome compositions to better define the risk of infections in liver transplant recipients. The team integrates existing coarse resolution data with newly collected deep metagenomics and metabolomics data.
- Itsik Pe’er, Computer Science
- Anne-Catrin Uhlemann, Infectious Diseases
Using multiple nationally representative large-scale exposure and cancer incidence datasets, this project builds a novel model-inference system to study the dynamics of colorectal cancer, to test a range of risk mechanisms over the life course, and to identify key risk factors underlying the recent increase in young onset colorectal cancer incidence in the U.S. to support more effective early prevention.
- Wan Yang, Epidemiology
- Mary Beth Terry, Epidemiology
- Jianhua Hu, Biostatistics
- Piero Dalerba, Pathology and Cell Biology
- This project leverages machine learning techniques to combine two types of single-cell data modalities to achieve a more comprehensive characterization of heterogeneous cell states in the tumor microenvironment. The team develops probabilistic models to elucidate the role of intercellular interactions in driving susceptibility of treatment-resistant mesenchymal tumor cells to a newly discovered ferroptotic vulnerability, which could offer a therapeutic avenue to prevent survival of these cancer cells that are prone to metastasis.
- Elham Azizi, Biomedical Engineering
- Jellert Gaublomme, Biological Sciences
- Brent Stockwell, Biological Sciences
- This team models, predicts, and targets therapeutic sensitivity and resistance of cancer due to the need for new approaches that will match drugs to genomic profiles. Their predictions are validated in models designed in Lasorella’s lab to enhance the accuracy of effort. They also integrate Bayesian modeling with variational inference and deep-learning methods to leverage the expertise of two leading teams in computational genomics (Rabadan’s group) and machine learning (Blei’s group).
- David Blei, Statistics, Computer Science
- Raul Rabadan, Systems Biology, Biomedical Informatics
- Anna Lasorella, Pathology and Cell Biology, Pediatrics
- Wesley Tansey, Systems Biology
- The team is building system infrastructure that will let large hospitals share data with smaller clinics and use differential privacy to respect boundaries between organizations. The long-term goal is to develop not only a library of useful private transferable knowledge (PTKs) for the medical community, but also to integrate them into a data sharing infrastructure system that creates, maintains, and optimizes multiple PTKs in support of a wide range of clinical datasets and research tasks.
- Roxana Geambasu, Computer Science
- Nicholas Tatonetti, Biomedical Informatics
- Daniel Hsu, Computer Science