Data Science Day 2023

Wednesday, April 19, 2023 (8:00 AM – 5:00 PM)

Data Science Day provides a forum for innovators in academia, industry, and government to connect. The April 19, 2023 event featured a keynote presentation from Manuela Veloso, Head of J.P. Morgan Chase AI Research; and Herbert A. Simon University Professor Emerita at Carnegie Mellon University; three sessions of Columbia-led lightning talks; interactive posters and technology demonstrations; and remarks from Lee C. Bollinger, President of Columbia University. Clifford Stein, Interim Director of The Data Science Institute; Wai T. Chang Professor of Industrial Engineering and Operations Research and Professor of Computer Science, was the master of ceremonies. 2023 marked the in-person return to Alfred Lerner Hall on the Morningside campus.

Event Stats

  • 500+ attendees
  • 85 research projects exhibited, including 79 posters and 6 demos

Data Science Day Archive


2023 Keynote Speaker

Manuela Veloso, Head of J.P. Morgan Chase AI Research; and Herbert A. Simon University Professor Emerita at Carnegie Mellon University

Manuela Veloso is Head of J.P. Morgan Chase AI Research and Herbert A. Simon University Professor Emerita at Carnegie Mellon University, where she was previously Faculty in the Computer Science Department and Head of the Machine Learning Department. Her recent interests are in Artificial Intelligence (AI), Symbiotic Human-Robot Autonomy, Continuous Learning Systems, and AI in Finance. She is past President of the Association for the Advancement of Artificial Intelligence (AAAI), and the co-founder and a past President of the RoboCup Federation. In her career she has received numerous awards and honors, including: National Science Foundation CAREER Award, Allen Newell Medal for Excellence in Research, Radcliffe Fellow, Einstein Chair Professor of the Chinese Academy of Sciences, and the ACM/SIGART Autonomous Agents Research Award. Veloso is a Fellow of AAAI, AAAS, ACM, and IEEE. She was elected in 2022 to the National Academy of Engineering for her “contributions to artificial intelligence and its applications in robotics and the financial service industry.”

Talk Title: Symbiotic Human-AI Interaction: Experience-Based Insights from the Finance Domain

Abstract:  In this talk, I share insights on the interaction of humans and AI to jointly solve end-to-end complex problems in the financial domain in particular. I will focus on data discovery, data standardization, synthetic data generation through simulations, data reconciliation, and explainability.  I will present the challenges and opportunities for a symbiotic interaction between humans with their principles, knowledge, and experience, and AI with its ability to learn from data and from feedback. The talk will include a discussion on the use of LLMs in multiple tasks.

Moderated By: Jeannette M. Wing, Executive Vice President for Research and Professor of Computer Science, Columbia University


Presidential Remarks

Lee C. Bollinger, President, Columbia University, joins the event to give remarks on the impact of data science and the Data Science Institute. Bollinger will be joined on stage by Clifford Stein (current DSI Interim Director); Jeannette M. Wing (former DSI Avanessians Director) and Kathleen R. McKeown (DSI’s founding Director), for a special ceremony to recognize his presidency and his contributions in the establishment of the Data Science Institute. 

Photo Credit: Eileen Barroso


2023 Lightning Talks

The Human in AI Systems

Kelton Minor
Postdoctoral Research Scientist, Data Science Institute, Columbia University

Talk Title: Global Monitoring of Emotional Responses to Climate Extremes: Evidence from Eight Billion Social Media Posts

Abstract: Climate change is intensifying regional heat and precipitation extremes, posing complex risks to human well-being on a planetary scale. Can pairing digital data streams with NLP provide a tool to track the hidden human impacts of climate stressors on daily life? In this talk, I’ll share key insights from a global-scale natural experiment that linked the lexical content of ~8 billion geolocated tweets across 190 countries and 13 languages with daily data on local climate extremes and weather conditions. Constructing historical sentiment atlases for nearly every county in the world, I’ll assess whether local exposure to randomly-timed climate hazards alters positive and negative online expressions compared to local baselines. Lastly, I’ll describe societal sentiment responses to two events statistically attributed to human-caused climate change: the 2021 U.S. Pacific Northwest heatwave and the Western European extreme rainfall event. These results starkly reveal a fundamental aspect of human responses to emerging climatic extremes: future psychosocial impacts may far exceed those registered in the recent past, barring adaptation beyond what society has already achieved.

Kaveri Thakoor
Assistant Professor of Ophthalmic Science (in Ophthalmology), Department of Ophthalmology, Columbia University Irving Medical Center

Talk Title: Creating a Robust, Interpretable, and Portable Medical-Expert–AI Team for Eye Disease Detection

Abstract: The focus of our Artificial Intelligence for Vision Science (AI4VS) Lab is to develop AI ‘partners’ to work in tandem with clinicians to expedite eye disease detection.  Our lab has 3 key goals: to robustly handle data collected from different sites/patient populations, (2) to ensure the mechanisms behind AI’s predictions are interpretable by medical experts, and (3) to create AI technology that is portable so it can reach those populations most in-need.  In this lightning talk, I will give an overview of our ongoing work toward tackling these three challenges, showcasing how symbiotic expert-AI teammates may be able to achieve better disease detection accuracy and interpretability than either one alone.

Maxim Topaz
Elizabeth Standish Gill Associate Professor of Nursing, School of Nursing, Columbia University Medical Center
Talk Title: Transforming Patient Care at Home with AI

Abstract: This presentation will cover the current trends in using AI in healthcare with examples in home healthcare. It will specifically look at how AI is being used to identify high-risk patients who need priority for nursing visits and automatically identify patients who are deteriorating. The presentation will also provide examples of studies using AI for speech recognition technologies to identify at-risk patients. Recommendations on future directions for research will be provided.

Sandra C. Matz
Daniel W. Zalaznick Associate Professor of Business, Columbia Business School
Talk Title: Using Big Data as a Window into People’s Psychology

Abstract: Every step you take online leaves a digital footprint. What can these footprints teach us about their owner’s preferences, needs and motivations – in short, their personality? How can such insights be used (or abused) to influence people’s behavior? And what might a future look like in which individuals benefit more from their data than they currently do?

Moderator: Anthony Vanky
Assistant Professor, Graduate School of Architecture, Planning, and Preservation, Columbia University


Data Driven Finance for Society and Wall Street

Harry Mamaysky
Professor of Professional Practice in the Faculty of Business; and Faculty Director, Program for Financial Studies, Columbia Business School
Talk Title: Credit Information in Earnings Calls

Abstract: We develop a novel technique to extract credit-relevant information from the text of quarterly earnings calls. This information is not spanned by fundamental or market variables and forecasts future credit spread changes. One reason for such forecastability is that our text-based measure predicts future credit spread risk and firm fundamentals. More firm- and call-level complexity increase the forecasting power of our measure for spread changes. Out-of-sample portfolio tests show the information in our measure is valuable for investors. Our results suggest that investors do not fully internalize the credit-relevant information contained in earnings calls.

Agostino Capponi
Associate Professor of Industrial Engineering and Operations Research, Columbia Engineering
Talk Title: Mutual Funds: First-Mover Investors, Redemptions, and Spillover Risk

Abstract: We study the vulnerability of mutual funds to fire-sale spillover losses. We account for the first-mover incentive that results from the mismatch between the liquidity offered to redeeming investors and the liquidity of assets held by the funds. We show that a higher concentration of first movers increases the aggregate vulnerability of the mutual fund system. When calibrated to U.S. mutual funds, our model shows that, in stressed market scenarios, spillover losses are significantly amplified through a nonlinear response to initial shocks that results from the first-mover incentive. Higher spillover losses provide a stronger incentive to redeem early, further increasing fire-sale losses and the transmission of shocks through overlapping portfolio holdings. (joint work with Paul Glasserman and Marko Weber)

Mario Small
Quetelet Professor of Social Science, Faculty of Arts and Sciences, Columbia University
Talk Title: Financial Institutions, Neighborhoods, and Racial Inequality

Abstract: Does living in a minority neighborhood make conventional banking harder? Based on more than 6 million queries, we compute the difference in the time required to walk, drive, or take public transit to the nearest bank vs. the nearest alternative financial institution (AFI – such as payday lender) from the middle of every block in each of 19 of the nation’s largest cities. We find that race is strikingly more important than class, as the AFI is more often closer than the bank in well-off minority neighborhoods than in poor white ones. I present some ideas about why.

Moderator: Yao Lu
Professor of Sociology, Faculty of Arts and Sciences, Columbia University


AI Generating Code, Dialog and Vision

Baishakhi Ray
Associate Professor of Computer Science, Columbia Engineering
Talk Title: Programming Language Processing: How AI can Revolutionize Software Development?

Abstract: The past decade has seen unprecedented growth in Software Engineering— developers spend enormous time and effort to create new products. With such enormous growth comes the responsibility of producing and maintaining quality and robust software. However, developing such software is non-trivial— 50% of software developers’ valuable time is wasted on finding and fixing bugs, costing the global economy around USD$1.1 trillion. Today, I will discuss how AI can help in different stages of the software development life cycle for developing quality products. In particular, I will talk about Programming Language Processing (PLP), an emerging research field that can model different aspects of code (source, binary, execution, etc.) to automate diverse Software Engineering tasks, including code generation, bug finding, security analysis, etc.

Zhou Yu
Associate Professor of Computer Science, Columbia Engineering
Talk Title: Seamless Natural Communication

Abstract: My research focuses on how to enable machines to interact with different users in a seamless fashion. To achieve that, I work on multimodal user modeling, dialog system planning, natural language understanding and generation, and human-computer interaction.

Carl Vondrick
Associate Professor of Computer Science, Columbia Engineering
Talk Title: Connecting Vision, Language, and Code for Explainable and Reprogrammable AI

Abstract: Vision-language models (VLMs) such as CLIP have shown promising performance on a variety of recognition tasks using the standard zero-shot classification procedure — computing similarity between the query image and the embedded words for each category. By only using the category name, they neglect to make use of the rich context of additional information that language affords. The procedure gives no intermediate understanding of why a category is chosen, and furthermore provides no mechanism for adjusting the criteria used towards this decision. We present an alternative framework for classification with VLMs, which we call classification by description. We ask VLMs to check for descriptive features rather than broad categories: to find a tiger, look for its stripes; its claws; and more. By basing decisions on these descriptors, we can provide additional cues that encourage using the features we want to be used. In the process, we can get a clear idea of what features the model uses to construct its decision; it gains some level of inherent explainability. We query large language models (e.g., GPT-3) for these descriptors to obtain them in a scalable way. Extensive experiments show our framework has numerous advantages past interpretability. We show improvements in accuracy on ImageNet across distribution shifts; demonstrate the ability to adapt VLMs to recognize concepts unseen during training; and illustrate how descriptors can be edited to effectively mitigate bias compared to the baseline.

Moderator: Eric L. Talley
Isidor and Seville Sulzbacher Professor of Law, Columbia Law School


Select Photos

2023 Lightning Talks

Thank You

DSI Industry Affiliates Program.