Matei Ciocarlie, Associate Professor of Mechanical Engineering, Fu Foundation School of Engineering and Applied Science Carl Vondrick, YM Associate Professor of Computer Science, Fu Foundation School of Engineering and Applied Science Joel Stein, Simon Baruch Professor of Physical Medicine and Rehabilitation; Chair, Department of Rehabilitation and Regenerative Medicine, Vagelos College of Physicians and Surgeons
While the disruptive power of generative learning may be best known for its role in large language models, Ciocarlie, Vondrick, and Stein want to bring generative learning to bear on a new world of languages: those of the human body.
By advancing and applying generative learning to the understanding of electromyographic signals – or electrical activity in muscles – the team aims to develop a wearable robotic device that can sense what activity a user is trying to perform, offering real-time physical assistance to stroke survivors and other people with motor impairments.
Emily Black, Assistant Professor, Computer Science, Barnard College Talia Gillis, Associate Professor of Law and Milton Handler Fellow, Columbia Law School
Recent executive orders call for increased oversight to address algorithmic bias in applications ranging from consumer credit to housing to health care. Black and Gillis will partner to bridge the existing gap between legal anti-discrimination requirements and the disparate impact doctrine, and the current ad hoc nature of many algorithmic fairness solutions, to create new technical and legal frameworks.
In the first stage of their work, they will focus on fair lending law, exploring how current compliance regimes attempt to prevent algorithmic discrimination. They will then develop new frameworks that are sensitive to the full decision-making pipeline to search for less discriminatory algorithms
Kaizheng Wang, Assistant Professor of Industrial Engineering and Operations Research, Fu Foundation School of Engineering and Applied Science Sharon Di, Associate Professor of Civil Engineering and Engineering Mechanics, Fu Foundation School of Engineering and Applied Science
Does successful deployment of self-driving cars on San Francisco streets mean the vehicles can be safely introduced into New York City traffic?
To make that assessment, Wang and Di will develop innovative transfer learning methods to evaluate existing driving algorithms and construct a traffic simulator to analyze future ones.
While their work aims specifically to help cities more safely adopt autonomous vehicles, the tools they propose to develop could be used in a broader range of scenarios where policy-makers use data from one domain to assess safety in another.
Bianca Howard, Assistant Professor of Mechanical Engineering Anthony Vanky, Assistant Professor, Graduate School of Architecture, Planning and Preservation
While the impact of climate change is global, given people’s diverse experiences – from habits and attitudes, to sensitivity to heat and air pollution – effective decarbonization plans need a tailored approach.
Through household interviews, participatory sensor measurements, and machine learning techniques, Howard and Vanky will analyze qualitative and quantitative data to uncover patterns and trends that drive energy use and their relationship to individuals’ lived experiences, opening up new opportunities for greenhouse gas emission reductions through building- and community-specific decarbonization plans that are both holistic and justice-centered.
Kechna Cadet, Postdoctoral Fellow, Epidemiology, Mailman School of Public Health Silvia Martins, Professor of Epidemiology, Mailman School of Public Health Smaranda Muresan, Research Scientist, Data Science Institute and Visiting Associate Professor, Barnard College
The social media platform Reddit offers surprisingly candid conversations about a range of public health issues, including substance use and other high risk behaviors. To leverage these posts as a window into the experiences of populations at high risk for HIV, Cadet, Martins, and Muresan will analyze this text using human-assisted machine learning and natural language processing (NLP).
They will convert narratives into usable and meaningful information for analysis, exploring patterns and themes to uncover reasoning for behaviors. They will also work to determine whether their findings can be used to improve understanding – and ultimately interventions – for real-life behaviors.
Jennifer L. Sokoloski, Research Scientist, Columbia Astrophysics Laboratory Savannah Thais, Associate Research Scientist, Data Science Institute
When the Rubin Observatory’s Legacy Survey of Space and Time (LSST) begins full operations in Chile in 2025-2026, its unprecedented images of the southern hemisphere of sky will offer data about the cosmos at a scale and complexity never before seen in the field of astrophysics.
To lay the groundwork for a decade of astrophysics discoveries that leverage this new resource, Sokoloski and Thais plan to use precursor data and simulated LSST data to design a machine-learning- based plan for research with LSST data that will uncover an important, previously hidden population of binary stars, and in the process conduct a census of accreting white dwarfs in our galaxy, with implications for the origin of supernovae used to study dark energy.
Kaveri A. Thakoor, Ophthalmology, Vagelos College of Physicians & Surgeons
Steven K. Feiner, Computer Science, Columbia Engineering
This DSI seed project aims to combine the pattern-recognition power of AI with the domain expertise of human medical experts to engineer human-vision–informed AI systems for enhanced eye disease detection accuracy and interpretability. We are one of the first teams that seeks to train AI systems with the eye movements of experts as they view ophthalmic images during disease diagnosis in order to create more trustworthy and accurate AI systems. The resulting systems could expedite disease detection, aid in medical education, and offer the potential to discover novel ocular diagnostic features.
Jason Healey, Saltzman Institute of War and Peace Studies, School of International and Public Affairs (SIPA)
Savannah Thais, Data Science Institute
The SIPA CYsyphus “SIGH-si-fis” Cyber Recommendations Project is a decision-support tool that does the heavy lifting required to mine existing cyber reports and the expertise of the cybersecurity community. The project is using data science and machine learning to create a searchable database of recommendations to reduce by an order of magnitude the time needed to research and propose cyber policy decisions. The broader research has included collaboration from Jennifer E. Lake, University of Texas in Austin.
Hod Lipson, Mechanical Engineering, Columbia Engineering
Simon Billinge, Materials Science and Applied Physics and Applied Mathematics, Columbia Engineering
This project will explore the use of deep generative networks to automatically determine the structure of complex molecules, directly from x-ray powder diffraction images. The project will search for an end-to-end deep network that will be able to determine the full three-dimensional electron density field (i.e. the “shape” of the molecule), directly from a 1-dimensional diffraction strip. A variety of ML model architectures will be explored and applied to synthetic data generated by simulated powder diffraction experiments on relatively simple molecule groups. The project will specifically focus on Powder Crystallography, because while it is a much more difficult problem than solid crystallography, it can be applied to a broad range of materials and applications. This challenge is as significant as the protein-folding problem.
Yvon L. Woappi, Physiology & Cellular Biophysics, Dermatology, Columbia University Medical Center; and Biomedical Engineering, Columbia Engineering
Bianca Dumitrascu, Statistics, Graduate School of Arts and Sciences; and Irving Institute for Cancer Dynamics (IICD)
The complex cellular events necessary to achieve mammalian tissue regeneration remain unknown. Our research pairs machine learning-powered gene target identification with high-throughput interventional functional genomics to pinpoint the causal genetic and molecular combinatorial changes necessary to promote wound regeneration.
Peter Bearman, Interdisciplinary Center for Innovative Theories and Empirics (INCITE), Graduate School of Arts and Sciences
Mark Olfson, Psychiatry and Epidemiology, Columbia University Medical Center
This project aims to use computational and machine learning methods to expand and demonstrate the efficacy of a novel data structure that captures at a granular level current inequalities in access to mental health treatment in the U.S., and to examine the impact of these inequalities on suicide—a leading cause of death and suffering in our society.
Valerie Purdie-Greenaway, Psychology
Alfredo Spagna, Psychology
Peter Bearman, Sociology
Jennifer Manly, Neurology
Smaranda Muresan, DSI, Computer Science
This team will develop a shared understanding of how diversity and inclusion (D&I) is conceptualized and studied in the academic literature and compare academic research on D&I to what is found in popular press outlets. The project will draw from social psychology, organizational behavior, and social-cognitive neuroscience to create a baseline for understanding the structure of scientific knowledge related to D&I and to understand what kinds of D&I research finds its way into the popular press.
Gerard Torrats-Espinosa, Sociology
Kara Rudolph, Public Health
This team proposes to create a novel linkage of police administrative records that capture highly detailed information on all search warrants that the Chicago Police Department executed from 2012 to 2020. They will document spatial and temporal patterns of search warrant use across Chicago’s neighborhoods.
Jeffrey A. Fagan, Law, Public Health
Rajiv Sethi, Barnard, Economics
Elizabeth Ananat, Barnard, Economics
Morgan C. Williams, Jr., Barnard, Economics
Brendan O’Flaherty, Economics
José Luis Montiel Olea, Economics
This project will create a data archive on non-fatal injuries and fatalities from police encounters—data that may be harmonized and integrated with other increasingly detailed datasets on police killings—and provide estimates of a continuum of police use of force. The new database will provide capacity and research opportunities for departments, schools, laboratories, and students across the university on an urgent public policy issue.
Upmanu Lall, Earth and Environmental Engineering
Bolun Xu, Earth and Environmental Engineering
This project combines data-driven renewable energy simulations with model-based storage pricing models to quantify the financial value of various energy storage technologies in integrating renewables and mitigating climate change in a decarbonizing electric power system.
Xiaofan (Fred) Jiang, Electrical Engineering
Daniel Westervelt, Lamont-Doherty Earth Observatory
This team will develop and apply a novel, globally applicable, bias correction algorithm to a fast-growing global network of consumer grade, low-cost air quality sensors. This method will allow users to obtain high-quality data from raw, unvalidated sensor data, thereby empowering communities to better understand their air pollution exposure and take action.
Veronica Barcelona, Nursing Kenrick Cato, Nursing Dena Goffman, Obstetrics and Gynecology Coretta Green, New York-Presbyterian Anita Holman, Obstetrics and Gynecology Janice James Aubey, Obstetrics and Gynecology Bernadette Khan, New York-Presbyterian Kenya Robinson, New York-Presbyterian Maxim Topaz, Nursing
This team will examine the association between linguistic bias and pregnancy-related morbidity among birthing people from 2017-2019 at two hospitals. They will use natural language processing approaches to: 1) identify stigmatizing language in clinical notes, 2) examine patterns of language use by race and ethnicity, and 3) study associations between language use and pregnancy-related morbidity.
James Anderson, Electrical Engineering Michael Mauel, Applied Physics Jeffrey Levesque, Applied Physics
Fusion science seeks to advance our fundamental understanding of physics and make plasma fusion viable for applications such as clean energy production. Tokamak fusion reactors generate vast and rich data sets obtained through multiple sensing modalities. The goal of this project is to develop new robust and efficient methods rooted in randomized numerical linear algebra for analyzing and characterizing complex fusion discharge dynamics.
Billy Caceres, Nursing Ipek Ensari, Data Science Institute Kasey Jackman, Nursing
This pilot study will use data science techniques to leverage ecological momentary assessment and consumer sleep technology to phenotype sleep health profiles in Black and Latinx sexual and gender minority adults. The investigators will use 30 days of daily electronic diaries and actigraphy to examine the associations of daily exposure to minority stressors (such as experiences of discrimination and anticipated discrimination) with sleep health among Black and Latinx sexual and gender minority adults.
Sean Luo, Psychiatry Min Qian, Biostatistics Kara Rudolph, Epidemiology
Pharmacologic treatment of opioid use disorder (OUD) is complicated by the likely absence of a one-size-fits-all best approach; rather, “optimal” dose and dose adjustment are hypothesized to depend on person-level factors, including factors that change over time, reflecting how well the individual is responding to treatment. This team will use harmonized data from multiple existing clinical trials with natural variability in OUD medication dose adjustments over time to 1) learn optimal dosing strategies, and 2) estimate the extent to which such optimal dosing strategies could reduce risk of treatment drop-out and relapse.
Colin Wayne Leach, Psychology, Africana Studies Courtney Cogburn, Social Work Sining Chen, Industrial Engineering and Operations Research Kathleen McKeown, Computer Science Susan McGregor, Data Science Institute
Social media is a powerful means of individual expression, and collective consolidation, of people’s sentiment about the most important issues in our society. This transdisciplinary project marries the latest advances in computational and statistical techniques of language use over time with social behavioral theories of emotion and stress to examine the temporal dynamics of tweets surrounding police killings of Black people and subsequent protests (e.g., Black Lives Matter).
Aviv Landau, Data Science Institute;
Desmond Patton, Social Work;
Maxim Topaz, Nursing
This team is developing an innovative artificial intelligence system to detect and assess risk for child abuse and neglect within hospital settings that would prioritize the prevention and reduction of bias against Black and Latinx communities.
Jacqueline Gottlieb, Neuroscience
Vince Dorie, Associate Research Scientist, Data Science Institute
In this project, online behavioral data will be collected from a large sample of participants, using a battery of tasks that probe different theories of how information is prioritized and used. This combined data set will allow an analysis of the latent factors that shape human-information demand while also unifying those theories.
Ruth DeFries, Ecology, Evolution and Environmental Biology
Arlene Fiore, Earth and Environmental Sciences;
Jeff Goldsmith, Biostatistics
Marianthi-Anna Kioumourtzoglou, Environmental Health Sciences
John Wright, Electrical Engineering
This team will develop methods to extract patterns from multiple datasets and identify the dominant sources of air pollution across India and how they vary in space and time. Their work is a step towards the overarching goal of informing effective clean air solutions and reducing public health burdens associated with exposure to air pollution in India.
Kriste Krstovski, Data Science Institute
Yao Lu, Sociology
This team combines new sources of labor market data with data science methods to identify factors and environments that shape gender and racial inequality in high-skilled labor market. The team will chart long-term career trajectories of a large number of high-skilled American workers and examine gender and racial variations; and construct measures of company environment, especially that pertains to gender and racial equity, and assess its consequences for the career path of different groups of skilled workers.
Itsik Pe’er, Computer Science
Anne-Catrin Uhlemann, Medicine
This team is developing methods for temporal analysis of gut microbiome compositions to better define the risk of infections in liver transplant recipients. They will integrate existing coarse resolution data with newly collected deep metagenomics and metabolomics data.
Elham Azizi, Biomedical Engineering
Jellert Gaublomme, Biological Sciences
Brent Stockwell, Biological Sciences
This team will develop probabilistic models to elucidate the role of intercellular interactions in driving susceptibility of treatment-resistant mesenchymal tumor cells to a newly discovered ferroptotic vulnerability, which could offer a therapeutic avenue to prevent survival of these cancer cells that are prone to metastasis.
Rene Hen, Neuroscience and Psychiatry Sergey Kalachikov, Chemical Engineering
Major depressive disorder is a debilitating illness that affects more than 350 million people around the world. The most common treatments are drugs such as Prozac. About half of the patients who take the pills, however, do not respond to treatment. This team is thus trying to understand the molecular mechanisms of such treatment resistance. Ultimately, they would like to be able to predict which people will respond to antidepressant drugs before they begin treatment, and to develop new treatments that can circumvent antidepressant resistance in the millions of people who do not respond now to antidepressants.
Matthias Preindl, Electrical Engineering
Alan West, Chemical Engineering
This engineering team is developing a machine-learning model that can estimate a Li-Ion battery’s charge level with greater accuracy, aiming for an error rate of just one percent.
Szabolcs Marka, Physics
Zsuzsanna Marka, Physics
Zelda Moran, Public Health;
This team is pioneering a machine-learning based imaging and sorting solution that aims to drastically reduce Africa’s tsetse population. The solution, which allows for the sorting of male and female tsetse flies, to support the Sterile Insect Technique, which the IAEA has used to eradicate tsetse populations in Zanzibar and other countries.
John Paisley, Electrical Engineering
Kai Ruggeri, Health Policy and Management
This research team intends to reduce missed appointments at community clinics by using big data and Bayesian machine learning techniques to understand why patients miss appointments and what can be done to help them keep them.
Pierre Gentine, Earth and Environmental Engineering
Marco Giometto, Civil Engineering and Engineering Mechanics
Mostaf Momen, Civil Engineering and Engineering Mechanics
Carl Vondrick, Computer Science
This team is developing machine-learning models and improved satellite-imaging techniques that will help environmental officials locate and characterize hazardous pollutants in the lower atmosphere, allowing them to design strategies to mitigate pollution.
Xi Chen, Computer Science
Sharon Di, Civil Engineering and Engineering Mechanics
Qiang Du, Applied Physics and Applied Mathematics
Eric Talley, Law
This team is developing a fundamental framework using the game theoretic approach to model the strategic interactions of conventional human-driven vehicles and autonomous and/or connected vehicles. Other than technical advances, this project will also address the Trolley Problem (i.e., ethical sense development) in AV algorithm design.
Roxana Geambasu, Computer Science
Daniel Hsu, Computer Science
Nicholas Tatonetti, Biomedical Informatics
This team is building an infrastructure system for sharing privacy-preserving machine learning models of large-scale, dynamic, clinical datasets. The system will enable medical researchers in small clinics or pharmaceutical companies to incorporate multitask feature models learned from big clinical datasets to bootstrap their own machine learning models on top of their (potentially much smaller) clinical datasets. The multitask feature models protect the privacy of individual records in the large datasets through a rigorous method called differential privacy.
Trenton Jerde, Zuckerman Institute
Nikolaus Kriegeskorte, Zuckerman Institute
Nima Mesgarani, Electrical Engineering
Chris Wiggins, Applied Physics and Applied Mathematics
This team will build a complementary mechanism for web-based sharing of reasoned judgments to perform probabilistic inference on contentious claims with machine learning algorithms and bring rationality to the social web.
Michael Collins, Computer Science
David Kipping, Astronomy
This team will build predictive models capable of intelligently optimizing telescope resources, and uncover the rules and regularities in planetary systems, specifically through the application of grammar induction methods used in computational linguistics.
David Blei, Statistics
Anna Lasorella, Pediatrics
Raul Rabadan, Systems Biology
Wesley Tansey, Systems Biology
This team aims to model, predict, and target therapeutic sensitivity and resistance of cancer. They will integrate Bayesian modeling with recently developed variational inference and deep learning methods and apply them to large scale genomic and drug sensitivity data across many cancer types.