Seed Fund Program

The DSI Seed Fund Program supports research collaborations between data scientists and domain experts.

Proposals for the 2025 Seed Fund Program closed on November 6th, 2024.

Program Goals

The DSI Seed Fund Program aims to advance research that brings together data science and domain expertise in new collaborations between Columbia faculty. Our goal is to foster long-term and deep interdisciplinary research relationships across the University, and to drive promising research initiatives forward.

The Data Science Institute is currently inviting new proposals for the Seed Fund Program, focused on these University strategic priority areas:

Artificial Intelligence (AI)
Climate
Mental Health
Social Justice

This year, DSI will award four grants, each with a maximum award of $125,000 for use over the period of one year. Grants will be awarded in January 2025, with projects set to begin on July 1, 2025 and end on June 30, 2026.

Program Terms

Proposals should represent new collaborations between faculty who have not previously worked together.
Proposals should be grounded in research that has the potential to garner future funding from government, industry or foundations.
The Seed award term will be approximately 18 months. DSI will aim to issue project awards in early January. From there, PIs will have six months for project set-up for hiring, budgeting, planning, and securing access to necessary data sets.
Projects will run from July 1 – June 30 of the following year. The expectation is that the entire award amount will be expensed in this period. Any funds that remain after June 30, 2026 will revert to the Institute.
DSI Seed Fund should be viewed as planning grants for future solicitations from DARPA, NIH, NSF, and others. We encourage the awardees to submit external grants through the Institute.

All proposals should discuss responsible and ethical use of data.

Funding

Funding Amount	Up to $125,000
Awards Per Term	Up to four projects will be awarded per cohort
Funding Period	One year of funding will be awarded (July 1 – June 30)

Funding Guidelines

Funding may be used for, but is not limited to:
- Operating supplies
- Equipment
- Travel directly related to research
- Publication costs

Seed fund may be used to support the hiring of PhD students, postdoctoral researchers, and student assistants.

Restrictions

Program funding cannot be used for:
- Faculty salary support, including summer salary support
- Salary support for DSI Research Scientists. Although DSI Research Scientists can participate on proposal teams, their salary support is already covered by the Institute.
- Salary support for project partners external to Columbia University

No-Cost-Extension Policy: The Seed Fund program does not allow for no-cost extensions. All project funding that is not expensed by June 30 will revert to the Institute.

There will be a six-month project set-up period; projects that do not have the personnel identified in place by June 15 (two weeks before the official project start date of July 1) will not have funding released. In this instance, the DSI team will schedule a conversation to discuss the project’s viability and determine whether a delayed start time is feasible. The Institute reserves the right to rescind the award if appropriate plans and details are not provided by the PI regarding project delay.

Eligibility Requirements

Each proposal must have a minimum of two collaborators, who are Columbia University faculty members who can serve as a Principal Investigator according to the Columbia Policy.
PIs must be from two distinct departments within the University; applications that feature two faculty from the same department will not be accepted.
Faculty should not have previously collaborated together.
Postdoctoral researchers may not serve as a PI for this seed award.

Seed Grant Terms

Reporting	Awardees will be required to submit progress reports. – A six-month progress report will be required on January 15, 2025. The report must detail progress to date, and plans to seek additional funding based on your research. – A final report will be required on June 15. – DSI will ask awardees to share updates on any publications, grant/gift funding, and other evidence of progress for several years after the project has ended.
Research Presentations	Awardees must be willing to present their research at future DSI events or seminars, including presentations to groups such as the DSI Executive Committee and the Data and Society Council, among others.
DSI Community Engagement	Awardees may be asked to review future Seed Fund proposals as part of service to the Institute. Additionally, we encourage award recipients to actively engage with the broader DSI community. Consider the following opportunities to deepen your involvement: – Participate in the Capstone Program by submitting a research project for MSDS student participation. – Mentor an MSDS student. – Serve as a reviewer for DSI postdoctoral applications. -Volunteer for or participate in the annual Data Science Day, held each spring semester. – Join a DSI Research Center. – Provide research opportunities to students through the Campus Connections program.
DSI Acknowledgment	Awardees should acknowledge DSI’s support in any papers, publications, or reports resulting from award fund activities.
External Funding	The goal of the Data Science Institute Seed Fund Program is to support new collaborations that will lead to longer term and deeper relationships among faculty in different disciplines across Columbia University. We expect that this seed money will provide a basis for your team to submit larger proposals, and, we ask that you submit external funding proposals and opportunities related to this research through the Data Science Institute.

Avenues for Collaboration

The Data Science Institute has a number of pathways to source additional support for your research:

DSI Research Scientists and Scholars represent a wide range of expertise, from the foundations of data science to domains where data science is heavily used. Collaborating with a DSI research scientist or scholar may accelerate your research project. Please reach out to them directly if this is of interest.
DSI Scholars Program: matches Columbia students with opportunities to engage in data science-related research projects led by Columbia faculty.
Campus Connections is another program that may be available to you if you find you need student research support after you have received an award.

Another avenue for potential collaborators is the Columbia Bridge to PhD Program in STEM

The Bridge to the Ph.D. Program in STEM is a structured, post-baccalaureate opportunity aimed to diversify the STEM professoriate and workforce. By including a Bridge to the PhD candidate in your DSI Seed Fund research proposal, you contribute towards increasing pathways for underrepresented students to advance in STEM disciplines. The Office of the Vice Provost for Faculty Advancement covers 70% of the scholar’s salary and fringe, with 30% (~$17K) expected from the sponsoring principal investigator (PI). Your DSI Seed Fund budget is eligible to cover the PI’s expected cost for sponsoring a scholar.

Criteria for Proposal

Seed fund determinations will be assessed based on the criteria below. Please consider addressing these questions in your proposal.

Why is the proposed project novel? Additionally, describe the novelty of the collaboration in terms of people, disciplines, and/or schools. Contrast to prior work is recommended.
How does this proposal align with one or more of the following priority areas:
- AI
- Climate
- Mental Health
- Social Justice
Why is seed funding essential to the success of this project?
How is the project inter-/multi-disciplinary?
Please outline your plans for securing future funding for this project, particularly your strategy for pursuing large-scale funding opportunities. Specifically, identify the external funding sources you plan to apply for and provide a detailed timeline for these applications.

All projects must be relevant to advancing and/or applying data science as a field.

Questions can be directed to dsi-seed@columbia.edu; or Radhika Patel, Chief Operating Officer at The Data Science Institute.

Recent Seed Fund Projects

Noam Elcott, Associate Professor, Dept of Art History and Archaeology, School of Arts & Sciences

Kathleen McKeown, Henry and Gertrude Rothschild Professor of Computer Science, SEAS

In this work, we focus on assessing and developing models to detect which artist is behind a given art image and generating explanations about the aesthetic features behind its prediction. We aim to advance both computational understanding and humanistic interpretation of how multimodal models generate and understand art images through probing of a model’s latent space. Latent spaces are abstract, high-dimensional areas within neural networks where patterns and relationships are encoded but not readily interpretable by humans. Studies of latent space are still nascent, but they offer important opportunities to better understand generative AI. A collaborative effort between computer scientists, science and technology studies (STS) scholars, art historians, and legal scholars, this interdisciplinary study investigates the intersection of artificial intelligence, image interpretation in latent space, and cultural analysis. By combining cutting-edge computational techniques with traditional humanistic inquiry, we aim to critically analyze how these models organize, encode, and produce images from textual inputs, revealing implicit biases, aesthetic assumptions, and the cultural knowledge embedded in machine learning systems.
Michah Goldblum, Assistant Professor, Dept of Electrical Engineering, School of Engineering and Applied Science

Arian Maleki, Associate Professor, Dept of Statistics, Graduate School of Arts and Sciences

James Anderson, Assistant Professor, Dept of Electrical Engineering, School of Engineering and Applied Sciences

Despite the promise of LLMs for automating data science, existing language models are severely limited in their ability to ingest and understand tabular data, or data in the form of spread-sheets, as well as time series. We propose to build large-scale table and time series comprehension datasets for training multimodal large language models that can readily comprehend tabular data.
Ning Qian, Associate Professor, Dept of Neuroscience, School of Medicine

Tian Zheng, Professor and Chair, Dept of Statistics, School of Arts & Sciences

Transformers and their variants are the most powerful sequence processors in AI. Biological visual processing is also sequential because of our small fovea and frequent saccadic eye movements. By comparing the two systems, we find both similarities and major differences. In this project, we propose to integrate current neuroscience discoveries of transsaccadic visual processing and AI research on vision transformers, with the goals of improving the efficiency of training vision transformers and providing computational insights into biological vision.
Ying Wei, Professor of Biostatistics, Director of TRAIL4Health, Dept of Biostatistics, Mailman School of Public Health

James Noble, Associate Professor, Dept of Neurology,Taub Institute for Research on Alzheimer’s Disease and the Aging Brain

Wenpin Hou, Assistant Professor, Dept of Biostatistics, Mailman School of Public Health

The project aims to develop a prototype of an AI platform, TRANSFORM-AD, which uses advanced transformer models and comprehensive nationwide Alzheimer’s data to forecast disease trajectories at the individual levels and develop utility tools to uncover key mechanistic insights and guide personalized care. This pilot will evaluate the platform’s potential to provide robust, trustworthy AI-driven solutions that enhance Alzheimer’s research, improve treatment strategies, and support precision healthcare.