We support research collaborations between data scientists and domain experts.
DSI Seed Funds Program
Funding Opportunity for Faculty and Research Staff
Every field has data. We use data to discover new knowledge, to interpret the world, to make decisions, and even to predict the future. The recent convergence of big data, cloud computing, and novel machine learning algorithms and statistical method is causing an explosive interest in data science and its applicability to all fields. This convergence has already enabled the automation of some tasks that improve human performance. The novel capabilities we derive from data science will drive our cars, treat disease, and keep us safe. At the same time, such capabilities risk leading to biased, inappropriate, or unintended action. The design of data science solutions requires both excellence in the fundamentals of the field and expertise to develop applications which meet human challenges without creating even greater risk.
The Data Science Institute (DSI) at Columbia University is a world leader in the theory and practice of the emerging field of data science. We advance the state of the art in data science; transform all fields, professions, and sectors through the application of data science; and ensure the responsible use of data for the benefit of society. It is in this context that DSI promotes “Data for Good”: using data to address societal challenges and bringing humanistic perspectives as—not after—new science and technology is invented.
The DSI Seed Funds Program supports new collaborations that will lead to longer term and deeper relationships among faculty in different disciplines across campus. Aimed at advancing research that combines data science expertise with domain expertise, the program’s funded research should embody the spirit of the Institute’s mission. We are interested in the application of data science to all domains and bringing ideas from other domains to bear on data science. Aligned with these efforts, we are particularly interested in proposals that address one or more of DSI’s focus areas: Business and Finance, Climate, Foundations of Data Science, Health, and Social Justice.
DSI is pleased to announce a call for proposals from Columbia University faculty and research staff for its Seed Funds Program. We seek proposals that represent new collaborations, which will ideally lead to future proposal submissions to government, industry, or foundations.
As such, DSI Seed Funds may be viewed as planning grants for upcoming solicitations from DARPA, NIH, NSF, etc. We encourage proposals that address the technical challenges in the fair and ethical use of data.
Statement on Racial Equity
DSI is committed to racial equity and justice. Proposals should explicitly state that the project will uphold these values, e.g., stating that the methods used to collect and analyze project data, and the project outcomes reported are fair, just, and ethical.
This year, DSI will provide two levels of funding. We will accept proposals with budgets up to $25,000, as well as proposals with budgets up to $75,000, annually, for a maximum of two years. The $25,000 grants are intended for projects where significant salary support of Ph.D. students, postdoctoral researchers, or research scientists is not needed. As a condition of funding, awardees will be required to submit quarterly financial reviews and biannual progress reports. Eligibility for continued funding for a second year will also require a progress report. All reports must include progress on external funding proposal submission(s) and other related activities (presentations,publications, etc.).
Budgets are encouraged to request support for DSI research scientists and for the university’s Bridge to the Ph.D. Program in STEM program.
DSI research scientists and scholars represent a wide range of expertise, from the foundations of data science to domains where data
science is heavily used. Collaborating with a DSI research scientist or scholar may accelerate your research project.
The Bridge to the Ph.D. Program in STEM is a structured, post-baccalaureate opportunity aimed to diversify the STEM professoriate and workforce. By including one of their scholars as part of your DSI Seed Funds research proposal, you contribute towards increasing pathways for underrepresented students to advance in STEM disciplines. The Office of the Vice Provost for Faculty Advancement covers 70% of the scholar’s salary and fringe, with 30% (~$17K) expected from the sponsoring principal investigator (PI). Your DSI Seed Funds budget is eligible to cover the PI’s expected cost for sponsoring a scholar.
The deadline for proposal submission is Tuesday, November 15, 2022. We will not accept incomplete or late submissions. Please submit the following materials via email in a single .doc or .pdf format to email@example.com:
- Application cover page
- Project proposal
- Racial equity statement
- CVs for faculty / collaborators (two-page, NSF-style format)
We anticipate notifying award recipients by December 21, 2022.
Columbia-IBM Center Seed Funds Program
DSI Postdoctoral Research Fellowships
Data Science and Health Initiative (DASHI)
PI Eligibility Criteria: Any faculty member/research scientist across the campus at either Morningside or CUIMC. Postdocs can participate in a collaborative team.
DASHI Budget: DASHI expects to fund at least 2 projects, up to $75k each.
DASHI invites applications for pilot projects in the intersection of Artificial Intelligence (AI) and Health Sciences. A Letter of Intent (LOI) by a single contact-PI without a complete project team is required. Please use this form to complete the LOI. The LOI needs to include:
- Tentative project title 150-word summary, that can be shared with the Columbia community. Description of complementary fields from which co-PI(s) are to be recruited to complement the project team.
LOI Submission Deadline: Tuesday, November 23, 2021
Pre-Proposal Brainstorming Workshop: DASHI will encourage the formation of interdisciplinary project teams through a cross-campus brainstorming workshop on Friday, December 10, 2021 (9:00 AM – 12:00 PM ET – virtual). All who intend to submit an LOI should register for the workshop. Those who will not submit an LOI are also encouraged to register to meet others and potentially join an existing project team. Register for the workshop here.
Accepted LOI summaries will be presented at the brainstorming workshop. This event will include a plenary presentations session, followed by breakout groups dedicated to refine the proposals and find others that may complement their team. The thus formed project team may then submit a full, 3-page application for seed funding of the pilot project. The final proposal should delineate a path towards external funding of the project beyond its pilot stage.
LOI Submission Deadline: Tuesday, November 23, 2021
Workshop acceptance notification: By Friday, December 3, 2021
Brainstorming workshop: Friday, December 10, 2021
Full pilot proposal deadline: Friday, January 7, 2022
Notification: Tuesday, January 18, 2022
Expected project start for selected proposals: Tuesday, February 1, 2022
Funded Research Projects
Szabolcs Marka, Physics; Zsuzsanna Marka, Physics; Zelda Moran, Public Health; John Wright, Electrical Engineering
This team is pioneering a machine-learning based imaging and sorting solution that aims to drastically reduce Africa’s tsetse population. The solution, which allows for the sorting of male and female tsetse flies, to support the Sterile Insect Technique, which the IAEA has used to eradicate tsetse populations in Zanzibar and other countries.
Pierre Gentine, Earth and Environmental Engineering; Marco Giometto, Civil Engineering and Engineering Mechanics; Mostaf Momen, Civil Engineering and Engineering Mechanics; Carl Vondrick, Computer Science
This team is developing machine-learning models and improved satellite-imaging techniques that will help environmental officials locate and characterize hazardous pollutants in the lower atmosphere, allowing them to design strategies to mitigate pollution.
Marianthi-Anna Kioumourtzoglou, Environmental Health Sciences; John Paisley, Electrical Engineering; Kai Ruggeri, Health Policy and Management
This research team intends to reduce missed appointments at community clinics by using big data and Bayesian machine learning techniques to understand why patients miss appointments and what can be done to help them keep them.
Aviv Landau, Data Science Institute; Desmond Patton, Social Work; Maxim Topaz, Nursing
This team is developing an innovative artificial intelligence system to detect and assess risk for child abuse and neglect within hospital settings that would prioritize the prevention and reduction of bias against Black and Latinx communities.
Matthias Preindl, Electrical Engineering; Alan West, Chemical Engineering
This engineering team is developing a machine-learning model that can estimate a Li-Ion battery’s charge level with greater accuracy, aiming for an error rate of just one percent.
David Blei, Statistics; Anna Lasorella, Pediatrics; Raul Rabadan, Systems Biology; Wesley Tansey, Systems Biology
This team aims to model, predict, and target therapeutic sensitivity and resistance of cancer. They will integrate Bayesian modeling with recently developed variational inference and deep learning methods and apply them to large scale genomic and drug sensitivity data across many cancer types.
Xi Chen, Computer Science; Sharon Di, Civil Engineering and Engineering Mechanics; Qiang Du, Applied Physics and Applied Mathematics; Eric Talley, Law
This team is developing a fundamental framework using the game theoretic approach to model the strategic interactions of conventional human-driven vehicles and autonomous and/or connected vehicles. Other than technical advances, this project will also address the Trolley Problem (i.e., ethical sense development) in AV algorithm design.
Michael Collins, Computer Science; David Kipping, Astronomy
This team will build predictive models capable of intelligently optimizing telescope resources, and uncover the rules and regularities in planetary systems, specifically through the application of grammar induction methods used in computational linguistics.
Roxana Geambasu, Computer Science; Daniel Hsu, Computer Science; Nicholas Tatonetti, Biomedical Informatics
This team is building an infrastructure system for sharing privacy-preserving machine learning models of large-scale, dynamic, clinical datasets. The system will enable medical researchers in small clinics or pharmaceutical companies to incorporate multitask feature models learned from big clinical datasets to bootstrap their own machine learning models on top of their (potentially much smaller) clinical datasets. The multitask feature models protect the privacy of individual records in the large datasets through a rigorous method called differential privacy.
Trenton Jerde, Zuckerman Institute; Nikolaus Kriegeskorte, Zuckerman Institute; Nima Mesgarani, Electrical Engineering; Chris Wiggins, Applied Physics and Applied Mathematics
This team will build a complementary mechanism for web-based sharing of reasoned judgments to perform probabilistic inference on contentious claims with machine learning algorithms and bring rationality to the social web.
Ruth DeFries, Ecology, Evolution and Environmental Biology; Arlene Fiore, Earth and Environmental Sciences; Jeff Goldsmith, Biostatistics; Marianthi-Anna Kioumourtzoglou, Environmental Health Sciences; Daniel Westervelt, Lamont-Doherty Earth Observatory; John Wright, Electrical Engineering
This team will develop methods to extract patterns from multiple datasets and identify the dominant sources of air pollution across India and how they vary in space and time. Their work is a step towards the overarching goal of informing effective clean air solutions and reducing public health burdens associated with exposure to air pollution in India.
Kriste Krstovski, Data Science Institute; Yao Lu, Sociology
This team combines new sources of labor market data with data science methods to identify factors and environments that shape gender and racial inequality in high-skilled labor market. The team will chart long-term career trajectories of a large number of high-skilled American workers and examine gender and racial variations; and construct measures of company environment, especially that pertains to gender and racial equity, and assess its consequences for the career path of different groups of skilled workers.
Itsik Pe’er, Computer Science; Anne-Catrin Uhlemann, Medicine
This team is developing methods for temporal analysis of gut microbiome compositions to better define the risk of infections in liver transplant recipients. They will integrate existing coarse resolution data with newly collected deep metagenomics and metabolomics data.
Piero Dalerba, Pathology and Cell Biology; Jianhua Hu, Biostatistics; Mary Beth Terry, Epidemiology; Wan Yang, Epidemiology
This team will build a novel model-inference system to study the dynamics of colorectal cancer, test a range of risk mechanisms over the life course, and identify key risk factors underlying the recent increase in young onset colorectal cancer incidence in the United States to support more effective early prevention.
Elham Azizi, Biomedical Engineering; Jellert Gaublomme, Biological Sciences; Brent Stockwell, Biological Sciences
This team will develop probabilistic models to elucidate the role of intercellular interactions in driving susceptibility of treatment-resistant mesenchymal tumor cells to a newly discovered ferroptotic vulnerability, which could offer a therapeutic avenue to prevent survival of these cancer cells that are prone to metastasis.