The Data Science Institute (DSI) at Columbia University is pleased to announce the latest recipients of its 2025 Seed Fund Program. From revolutionizing art analysis through AI to advancing personalized care for Alzheimer’s disease, this year’s awardees involve faculty from Engineering, Arts and Sciences, Public Health and Medicine. The selected projects reflect the Data Science Institute’s commitment to cultivating new research partnerships at the forefront of data science and AI-driven innovation that bridge the rich ecosystem of Columbia’s disciplines.  

This year’s awardees are:

Art Images and AI: Latent Space Interpretability, Art History, and the Law

  • Noam Elcott, Art History and Archaeology, Graduate School of Arts & Sciences 
  • Kathleen McKeown, Computer Science, Fu Foundation School of Engineering and Applied Science

In this work, the focus is on assessing and developing models to detect which artist is behind a given art image and generating explanations about the aesthetic features behind its prediction. The aim is to advance both computational understanding and humanistic interpretation of how multimodal models generate and understand art images through probing of a model’s latent space. Latent spaces are abstract, high-dimensional areas within neural networks where patterns and relationships are encoded but not readily interpretable by humans. Studies of latent space are still nascent, but they offer important opportunities to better understand generative AI. A collaborative effort between computer scientists, science and technology studies (STS) scholars, art historians, and legal scholars, this interdisciplinary study investigates the intersection of artificial intelligence, image interpretation in latent space, and cultural analysis. By combining cutting-edge computational techniques with traditional humanistic inquiry, the PIs aim to critically analyze how these models organize, encode, and produce images from textual inputs, revealing implicit biases, aesthetic assumptions, and the cultural knowledge embedded in machine learning systems.

Language Models for Tabular and Time Series Comprehension

  • Micah Goldblum, Electrical Engineering, Fu Foundation School of Engineering and Applied Science
  • Arian Maleki, Statistics, Graduate School of Arts and Sciences
  • James Anderson, Electrical Engineering, Fu Foundation School of Engineering and Applied Science

Despite the promise of large language models (LLMs) for automating data science, existing LLMs are severely limited in their ability to ingest and understand tabular data, or data in the form of spreadsheets, as well as time series. The PIs will build large-scale table and time series comprehension datasets for training multimodal large language models that can readily comprehend tabular data.

Integrating biological visual processing and vision transformers

  • Ning Qian, Neuroscience, Vagelos College of Physicians and Surgeons
  • Tian Zheng, Statistics, Graduate School of Arts & Sciences 

Transformers and their variants are the most powerful sequence processors in AI. Biological visual processing is also sequential because of our small fovea and frequent saccadic eye movements. By comparing the two systems, one can find both similarities and major differences. In this project, the PIs will integrate current neuroscience discoveries of transsaccadic visual processing and AI research on vision transformers with the goals of improving the efficiency of training vision transformers and providing computational insights into biological vision. 

TRANSFORM-AD: A Pilot Transformer-based AI Platform for Personalized Alzheimer’s Disease Progression Forecasting and Intervention Learning

  • Ying Wei, Biostatistics, Mailman School of Public Health
  • James Noble, Taub Institute for Research on Alzheimer’s Disease and the Aging Brain
  • Wenpin Hou, Biostatistics, Mailman School of Public Health

The project aims to develop a prototype of an AI platform, TRANSFORM-AD, which uses advanced transformer models and comprehensive nationwide Alzheimer’s data to forecast disease trajectories at the individual level and develop utility tools to uncover key mechanistic insights and guide personalized care. This pilot will evaluate the platform’s potential to provide robust, trustworthy AI-driven solutions that enhance Alzheimer’s research, improve treatment strategies, and support precision healthcare.

Advancing Data Science and AI Research Through Strategic Investment

The DSI Seed Fund Program is one of the many ways the Data Science Institute enhances the scope and impact of data science and AI research at Columbia. By fostering interdisciplinary partnerships, the program accelerates promising research, ensuring that AI serves not just technological progress but societal benefit, advancing the Data Science Institute’s ethos of Data for Good.

The 2025 Seed Fund awardees will begin their projects on July 1, 2025, with results expected to drive further advancements in AI applications and external funding opportunities.

Find out more about the DSI Seed Fund Program.