​The Data Science Institute’s Seed Funds Program supports new collaborations with the goal of developing longer-term and deeper interdisciplinary relationships among faculty at Columbia. Aimed at advancing research that combines data science with domain expertise, selected projects embody the spirit of DSI’s mission, and address the technical strengths needed to create more fair and ethical use of data.

For 2023, the Institute has selected five outstanding projects, reflecting solution-oriented approaches to issues in healthcare access and social inequalities; disease diagnosis and treatment; cybersecurity and public policy; and materials science and discovery. The awarded research teams will accelerate their projects over the course of the next year, with access to DSI’s extensive network of scientists, scholars, and technical experts. 

The list of projects and Principal Investigators is included below. We congratulate the teams on their accomplishment. 


Creating an Expert–AI Team for Eye Disease Detection Driven by Expert Gaze Data
Kaveri A. Thakoor, Ophthalmology, Vagelos College of Physicians & Surgeons
Steven K. Feiner, Computer Science, Columbia Engineering

This DSI seed project aims to combine the pattern-recognition power of AI with the domain expertise of human medical experts to engineer human-vision–informed AI systems for enhanced eye disease detection accuracy and interpretability. We are one of the first teams that seek to train AI systems with the eye movements of experts as they view ophthalmic images during disease diagnosis to create more trustworthy and accurate AI systems. The resulting systems could expedite disease detection, aid in medical education, and offer the potential to discover novel ocular diagnostic features.


CYsyhus (Cybersecurity Recommendations Project)
Jason Healey, Saltzman Institute of War and Peace Studies, School of International and Public Affairs (SIPA)
Savannah Thais, Data Science Institute

The SIPA CYsyphus “SIGH-si-fis” Cyber Recommendations Project is a decision-support tool that does the heavy lifting required to mine existing cyber reports and the expertise of the cybersecurity community. The project is using data science and machine learning to create a searchable database of recommendations to reduce by an order of magnitude the time needed to research and propose cyber policy decisions. The broader research has included collaboration from Jennifer E. Lake, University of Texas in Austin.


Deep Generative Powder Crystallography
Hod Lipson, Mechanical Engineering, Columbia Engineering
Simon Billinge, Materials Science and Applied Physics and Applied Mathematics, Columbia Engineering

This project will explore the use of deep generative networks to automatically determine the structure of complex molecules, directly from x-ray powder diffraction images. The project will search for an end-to-end deep network that will be able to determine the full three-dimensional electron density field (i.e. the “shape” of the molecule), directly from a 1-dimensional diffraction strip. A variety of ML model architectures will be explored and applied to synthetic data generated by simulated powder diffraction experiments on relatively simple molecule groups. The project will specifically focus on Powder Crystallography, because while it is a much more difficult problem than solid crystallography, it can be applied to a broad range of materials and applications. This challenge is as significant as the protein-folding problem. 


Interventional Representation Learning for Intelligent Wound Healing Strategies
Yvon L. Woappi, Physiology & Cellular Biophysics, Dermatology, Columbia University Medical Center; and Biomedical Engineering, Columbia Engineering
Bianca Dumitrascu, Statistics, Graduate School of Arts and Sciences; and Irving Institute for Cancer Dynamics (IICD)

The complex cellular events necessary to achieve mammalian tissue regeneration remain unknown. Our research pairs machine learning-powered gene target identification with high-throughput interventional functional genomics to pinpoint the causal genetic and molecular combinatorial changes necessary to promote wound regeneration.


Pre-Pandemic and Ongoing Barriers to Mental Health Care Access, Social Disadvantage, and Suicide
Peter Bearman, Interdisciplinary Center for Innovative Theories and Empirics (INCITE), Graduate School of Arts and Sciences
Mark Olfson, Psychiatry and Epidemiology, Columbia University Medical Center

This project aims to use computational and machine learning methods to expand and demonstrate the efficacy of a novel data structure that captures at a granular level current inequalities in access to mental health treatment in the U.S., and to examine the impact of these inequalities on suicide—a leading cause of death and suffering in our society.