Data Science Is Transforming How Research is Conducted at Columbia

DSI Seeds-Fund Grants Support Pioneering Research at Columbia University

One team aims to eradicate the tsetse fly from Africa, a parasitic insect that has decimated cattle in sub-Saharan Africa. A second team is working to characterize hazardous pollutants in the atmosphere, while a third team of researchers has identified genetic pathways in the brain associated with depression, a discovery that could improve treatments for the millions of people diagnosed with the disease. 

These are three research projects funded recently by the Data Science Institute Seeds Fund Program. DSI awarded the grants to a total of five research teams whose overarching objective is to merge data science with traditional fields to solve societal problems. Each of the five teams will receive up to $100,000 per year and be eligible for a second year of funding. This is the second year of the Seeds Fund program; last year five teams also received the grants. 

The program encourages collaborations between researchers from various disciplines and departments throughout Columbia, so that the most advanced data science techniques will infuse and transform several fields at the university, may it be public health and physics, genetics and healthcare, or environmental science and journalism. 

“In awarding these grants, the DSI review committee selected projects that brought together teams of scholars who will push the state-of-the-art in data science while using data science responsibly and ethically,” said Jeannette M. Wing, Avanessians Director of the Data Science Institute. “The five winning teams combine data-science experts with domain experts who together aspire to solve wicked hard societal problems.”

The Seeds Fund Program is just one of many initiatives that Wing has spearheaded since the summer of 2017, when she was named director of DSI. Her other initiatives include founding a Post-Doctoral Fellows Program, a Faculty Recruiting Program, and an Undergraduate Research Program. She also hired leading research scientists, helped organize workshops and a bootcamp on "Data for Good" for the Obama Foundation Scholars, and continues to host Data Science Day, an annual conference that brings together leaders in the field from industry, government and academia. This year's conference is scheduled for April 3, 2019, when the DSI Seed Fund teams - this year’s winners as well as five teams from 2018 - will present their pioneering research. 

What follow are brief descriptions of the five winning research projects.

Machine Learning to the Rescue (Read more) 

This team is pioneering a machine-learning based imaging and sorting solution that aims to drastically reduce Africa’s tsetse population. The solution, which allows for the sorting of male and female tsetse flies, to support the Sterile Insect Technique, which the IAEA has used to eradicate tsetse populations in Zanzibar and other countries. The sterilizing technique uses irradiation to render large numbers of male flies infertile. The flies are then released into breeding grounds, where they mate with female flies. Since the females usually mate only once in a lifetime, the unfertilized mating will drastically reduce the tsetse population and help eliminate the spread of the disease.

Tracking Air Pollutants (Read more)

This team is developing machine-learning models and improved satellite-imaging techniques that will help environmental officials locate and characterize hazardous pollutants in the lower atmosphere, allowing them to design strategies to mitigate pollution. The models will use machine learning to track how pollution plumes are transported by atmospheric turbulence, which controls the dispersion of contaminants in the lower atmosphere.

Nudging New York (Read more)

This research team intends to reduce missed appointments at community clinics by using big data and Bayesian machine learning techniques to understand why patients miss appointments and what can be done to help them keep them. The researchers have partnered with the Community Healthcare Network, a federally funded clinic in New York City serving disadvantaged communities.

Molecular Mechanism of Treatment-Resistant Depression (Read more)

Major depressive disorder is a debilitating illness that affects more than 350 million people around the world. The most common treatments are drugs such as Prozac. About half of the patients who take the pills, however, do not respond to treatment. This team is thus trying to understand the molecular mechanisms of such treatment resistance. Ultimately, they would like to be able to predict which people will respond to antidepressant drugs before they begin treatment, and to develop new treatments that can circumvent antidepressant resistance in the millions of people who do not respond now to antidepressants.

Data-Driven Modeling and Estimation of Li-Ion Battery Properties (Read more)


This engineering team is developing a machine-learning model that can estimate a Li-Ion battery's charge level with greater accuracy, aiming for an error rate of just one percent. What are known as Battery Management Systems are trained to capture a battery's state of health and to predict its remaining life time. These two concepts are important since they help owners of electric vehicles know when to stop the car to recharge its battery as well as when to schedule battery replacements. Furthermore, a high-estimation accuracy model translates into a lifetime extension of battery packs.

—by Robert Florida


550 W. 120th St., Northwest Corner 1401, New York, NY 10027    212-854-5660
©2018 Columbia University