Data Science Day 2025

Wednesday, April 2, 2025 (8:00 AM – 5:00 PM)

Alfred Lerner Hall, Columbia University Morningside Campus
Address: 2920 Broadway, New York, NY 10027

The Data Science Institute’s flagship annual event connects innovators in industry and government to Columbia researchers who are propelling advances across every sector with data science. The 2025 event will feature a keynote presentation from Rick Rioboli, Executive Vice President and Chief Technology Officer, Comcast Connectivity and Platforms; three sessions of Columbia-led lightning talks; and 100+ interactive posters and technology demonstrations.

Register


2025 Program

8:00 AM: Doors Open for Registration and Breakfast

Please plan to arrive early to check-in, have breakfast, and find your seat.

9:00 AM: Opening Remarks

Garud Iyengar, Avanessians Director of the Data Science Institute and Professor of Industrial Engineering and Operations Research, will be the master of ceremonies.


9:05 AM: Session 1

The Future of Data Science and AI

AI is entering a new phase of innovation, with emerging technologies reshaping its capabilities. Small Language Models are offering more efficient and specialized alternatives to large AI systems, while causal AI is transforming decision-making by uncovering cause-and-effect relationships. As AI adoption accelerates, researchers are also focusing on energy-efficient computing for sustainability and embodied intelligence for enhanced human-AI interaction. The Future of Data Science and AI will examine how these advancements are driving AI toward greater adaptability, efficiency, and impact across industries.

Speakers:

  • Vishal Misra, Professor of Computer Science; Vice Dean of Computing and Artificial Intelligence, Columbia Engineering
  • Smaranda Muresan, Associate Professor of Computer Science, Barnard College
  • Matei Ciocarlie, Associate Professor of Mechanical Engineering, Columbia Engineering
  • David Sandalow, Inaugural Fellow at the Center on Global Energy Policy (CGEP); and Co-Director of the Energy and Environment Concentration, School of International and Public Affairs
  • Moderator: Dhrumil Mehta, Associate Professor in Data Journalism; and Deputy Director of the Tow Center for Digital Journalism, Columbia Journalism School

10:00 AM: Keynote

Rick Rioboli, Executive Vice President and Chief Technology Officer, Comcast Connectivity and Platforms

Rick Rioboli has over 25 years of experience leading large-scale technology teams, bringing innovative customer-facing products to market. He has led technology transformations in customer experience, AI, and data privacy, earning three Technology and Engineering Emmys.

Talk: AI in Large Enterprises… Crossing the Value Chasm

As AI technology continues to evolve at a breakneck pace, large enterprises are scrambling to leverage this technology to transform their businesses. In this talk, Rick Rioboli will explore the reasons behind the gap between AI’s potential and its actual realized value in large enterprises.  He will also discuss which types of business problems are best suited to be solved with current AI capabilities, the forms of AI being used to address these challenges, and some examples of where AI is being used in large scale production. Finally, he will assess where we need advances in technology to drive wider scale adoption in enterprises and generate real business transformational value.  


11:00 AM: Coffee Break


11:15 AM: Session 2

Secure Data Science: Detect, Defend, Deter

Effective data science is vital for strengthening cybersecurity and maintaining digital integrity. As cyberattacks become more sophisticated, researchers and engineers are advancing methods to detect and counter emerging risks like deep fakes and data manipulation. Secure Data Science: Detect, Defend, Deter explores data-driven security strategies, responsible data-sharing frameworks, and innovative approaches to bolstering digital trust and resilience in an evolving cybersecurity landscape.

Speakers:

  • Junfeng Yang, Professor of Computer Science, Columbia Engineering
  • Rebecca Wright, Druckenmiller Professor and Chair of Computer Science, Director of the Vagelos Computational Science Center, Barnard College
  • Jason Healey, Senior Research Scholar, School of International and Public Affairs
  • Moderator: Daniel Richman, Paul J. Kellner Professor of Law, Columbia Law School

12:00 PM: Session 3

The Science of Self Care: Innovations in Consumer Health

Advancements in consumer health and wellness are reshaping how individuals monitor and manage their well-being. Sophisticated wearables and diagnostic tools are making personalized healthcare, predictive analytics, and early disease detection more accessible. The Science of Self Care: Innovations in Consumer Health explores the evolution of tailored health strategies and the integration of technology into daily wellness routines, highlighting the growing impact of data-driven insights on proactive health management.

Speakers:

  • Noémie Elhadad, Associate Professor of Biomedical Informatics; and Chair, Department of Biomedical informatics, Vagelos College of Physicians and Surgeons
  • Carri Chan, John A. Howard Professor of Business, Columbia Business School
  • Drago Plecko, Postdoctoral Research Scientist in the Department of Computer Science, Columbia Engineering
  • Moderator: Lena Mamykina, Associate Professor of Biomedical Informatics, Vagelos College of Physicians and Surgeons

12:45 PM: Closing Remarks


1:00 PM: Lunch, Posters and Demos

Meet and network with Columbia researchers who are shaping the future of data science as they showcase the next generation of methods and applications. Explore 100+ projects spanning diverse disciplines, including business, healthcare, cybersecurity, technology, energy, climate science, public policy, and the arts.


4:00 PM: Attendee Networking Reception


5:00 PM: Event Ends


Registration

Ticket Rates

  • DSI Industry Affiliates: Data Science Day is made possible by the support of the DSI Industry Affiliates Program. Current employees of DSI Industry Affiliate Companies receive complimentary admission. Please email datascience@columbia.edu to receive a code to register.
  • Columbia University Students: $20
  • Columbia University Faculty, Staff and Researchers: $75
  • Columbia University Alumni: $75
  • General Admission: $200
  • Exhibitors: Columbia researchers who are exhibiting will receive free admission.

DSI Industry Affiliate Companies

Logos of DSI Industry Affiliate Companies

Register


Abstracts & Speaker Information


Session 1: Abstracts

Vishal Misra, Professor of Computer Science; Vice Dean of Computing and Artificial Intelligence, Columbia Engineering

   

Title: LLMs are Bayesian Learners

   
Abstract: Recent breakthroughs in Large Language Models (LLMs) reveal a surprising capacity for in-context learning; given just a few prompt examples, LLMs can adapt to new tasks or domains. In this talk, Misra will present a unifying perspective that interprets LLMs as approximate Bayesian learners. First, the talk revisits the “Beyond the Black Box” framework, which views LLMs as approximating an enormous matrix that maps each prompt to a next-token distribution—effectively performing Bayesian updates as they consume new context. Then, the talk will turn to a more local geometric perspective—“The Bayesian Geometry of LLMs”—showing how cross-entropy training shapes key-query vectors to act like posterior probabilities in the attention mechanism. By examining phenomena such as code completion, domain-specific language prompts, and chain-of-thought reasoning, Misra illustrates how strongly predictive “subsequence snippets” dominate the LLM’s attention and override generic prior knowledge. He concludes by discussing implications for prompt engineering, hallucination control, and future directions in retrieval-augmented or alternative LLM architectures. Overall, this talk bridges two complementary viewpoints, providing a robust explanation for how LLMs transition so effortlessly from generic completions to specialized in-context behavior.

   


   

Smaranda Muresan, Associate Professor of Computer Science, Barnard College

   

Title: Human-centric Natural Language Processing for Social Good

   
Abstract: Large language models (LLMs) constitute a paradigm shift in Natural Language Processing (NLP) and its applications across all domains. To move towards human-centric NLP designed for social good, this talk argues that we need knowledge-aware NLP systems and human-AI collaboration frameworks. NLP systems that interact with humans need to be knowledge-aware (e.g., commonsense, sociocultural norms) and context-aware (e.g., social, perceptual) so that they communicate better and more safely with humans. Moreover, NLP systems should be able to collaborate with humans to create high-quality datasets for training and/or evaluating NLP models, to help humans solve tasks, and ultimately to align better with human values. In this talk, Muresan gives a brief overview of her lab’s research around NLP for social good, such as NLP for public health, creativity support and building NLP technologies with language and culture diversity in mind.

   


   

Matei Ciocarlie, Associate Professor of Mechanical Engineering, Columbia Engineering

   

Title: Robotics and Embodied Intelligence: Is the Age of the General-Purpose Robot Around the Corner?
   
Abstract: In recent years, motor control methods based on machine learning have revolutionized the field of robotics, from locomotion to manipulation. Building on this change, large vision-language-action models now promise to provide the final missing link, giving robots the semantic intelligence needed to operate in ever-changing, human environments. Does this mean that the general-purpose robot, a long-held aspiration of the field that for decades seemed to remain out of reach indefinitely, is suddenly about to become a reality?

   


   

David Sandalow, Inaugural Fellow at the Center on Global Energy Policy (CGEP); and Co-Director of the Energy and Environment Concentration, School of International and Public Affairs

   

Title: AI and climate change mitigation
   
Abstract: AI has significant potential to help reduce greenhouse gas emissions. ML tools are improving the productivity of solar power plants and energy efficiency of buildings. AI can help accelerate materials innovation, with potentially transformational benefits for energy storage, wind power and nuclear power. Other examples abound. At the same time, AI poses significant risks with respect to climate change. We explore current estimates of the greenhouse gas impacts of AI, the potential for AI to contribute to climate change mitigation and climate risks posed by AI. We conclude with recommendations for maximizing the potential benefits and minimizing the risks of AI for climate change mitigation.


Session 2: Abstracts

Junfeng Yang, Professor of Computer Science, Columbia Engineering

   

Title: Robust Deepfake Detection via Rewriting

   
Abstract: Abstract: Generative AI supercharges productivity but also fuels abuse, from phishing and propaganda to scams like $25M deepfake video call heist and voice-mimicking kidnapping ploys. Current detectors for AI-generated content often target specific models or datasets, faltering with poor accuracy when the model or data distribution shifts. Yang’s talk presents Raidar, a simple, robust detection method. Raidar takes input content x, prompts another AI to rewrite it as output y, and flags x as AI-generated if the edits between x and y are minimal. Raidar is agnostic to the generative model or process, so it is inherently robust on new content. Extensive evaluation shows that Raidar improves detection accuracy by up to 26%, remaining robust across diverse datasets and models, including the toughest adversarial attack. Yang’s follow-up work extends Raidar’s approach to deepfake voice and video. Results illustrate the unique imprint of machine-generated content through the lens of the machines themselves.

   


   

Rebecca Wright, Druckenmiller Professor and Chair of Computer Science, Director of the Vagelos Computational Science Center, Barnard College

   

Title: Accountability in Computing

   
Abstract: With the increased reliance on computing systems and AI-based decision making, accountability in computing is more important than ever. While “accountability” is used often in describing computer-security mechanisms that complement preventive security, it lacks a precise, agreed-upon definition. This talk will discuss the need for accountability in computing and some of the many ways that the term is used. We will also explore potential tradeoffs between accountability and other potentially desirable properties, such as privacy.

   


   

Jason Healey, Senior Research Scholar, School of International and Public Affairs

   

Title: Is Cyber Defense Winning?
   
Abstract: Based on the National Cybersecurity Strategy, the US government is working to rebalance “the advantage to [cyber] defenders and perpetually frustrating the forces that would threaten it” through “fundamental changes to the underlying dynamics of the digital ecosystem.” But how will defenders – flooded with ambiguous and context-free data – know if they are shifting the balance? For example, “Google observed 97 zero-day vulnerabilities … in 2023, over 50 percent more than 2022.” Does such a large increase of zero-days, which only become known to defenders when they are used as part of an attack, mean defenders are doing better, or worse? Without a larger framework, there is no way to know. This talk summarizes the first such framework, which categorizes, from the vast universe of metrics, the very few which shed light on the system-wide offense-defense competition for advantage, so that determining success is as data driven as possible.


Session 3: Abstracts

Noémie Elhadad, Associate Professor of Biomedical Informatics; and Chair, Department of Biomedical informatics, Vagelos College of Physicians and Surgeons

   

Title: AI for Endometriosis: Enhancing Detection, Management, and Self-Care

   
Abstract: Endometriosis is one of several women’s health conditions that remain underdiagnosed and lack clear guidance for care, leaving millions to navigate their symptoms with little support. AI has the potential to bridge these gaps by improving detection, characterization, and self-management strategies through patient-driven and real-world data. This talk will highlight the Citizen Endo project, including the Phendo app, as a model to capture the lived experiences of patients, generate new insights about this disease, and help patients manage their symptoms. By focusing on endometriosis, Elhadad explores how AI can transform the landscape of women’s health more broadly, ensuring that under-researched conditions receive the attention they deserve.

   


   

Carri Chan, John A. Howard Professor of Business, Columbia Business School

   

Title: Data-Driven Transformations for Access, Costs, and Care: Prediction-Driven Staffing in the Emergency Department
   
Abstract: The ever-growing demand for healthcare services strains existing resources, including skilled professionals, infrastructure, and time. Artificial intelligence offers a potential path to address these critical gaps. Chan discusses how predictive models of patient demand that use a combination of historical data and real-time information can be leveraged to optimize nurse staffing at hospitals, resulting in improved Emergency Department efficiency, accessibility, and cost savings. The talk examines the implementation of AI-driven staffing solutions and discusses their impact on key healthcare metrics.

   


   

Drago Plecko, Postdoctoral Research Scientist in the Department of Computer Science, Columbia Engineering

   

Title: Causal Fairness Analysis for Causal Health Equity

   
Abstract: In this talk, Plecko describes the three basic tasks of Causal Fairness Analysis: (i) bias detection, (ii) fair prediction, and (iii) fair decision-making. For each task, he also mentions relevant real-world examples in an attempt to build a catalog of different fairness settings. Plecko also describes how Causal Fairness Analysis can be used to explain racial and ethnic disparities following admission to an intensive care unit (ICU). The analysis reveals that minority patients are much more likely to be admitted to the ICU, and that this increase in admission is linked with lack of access to primary care. This led Plecko to construct the Indigenous Intensive Care Equity (IICE) Radar, a monitoring system for tracking the over-utilization of ICU resources by the Indigenous population of Australia across geographical areas, opening the door for targeted public health interventions aimed at improving health equity.

   


   


Rick Rioboli: Biography


Rick Rioboli serves as Executive Vice President and Chief Technology Officer of Comcast Connectivity and Platforms. In this role, he leads the technology organization that drives the innovation, development and management of the global platforms that power both customer and employee experiences, as well as the technology, architecture, and tools that underpin them. Over the past 18 years, Rick’s teams have been instrumental in the technology transformation propelling critical strategic initiatives at Comcast in customer experience, digital first, data privacy, and artificial intelligence and machine learning. They have been awarded three Technology and Engineering Emmys as recognition for the impact of their work on television engineering.
   
Rick has over 25 years of experience leading product and engineering teams focused on bringing innovative technologies and customer-facing products to market. He has been recognized as a Forbes CIO Top 50 Innovative Technology Leader and has received the Philly CIO of the Year Leadership and the Cable TV Pioneer awards.
   
Rick received his Bachelors in Electrical Engineering from Penn State University and his Masters in Computer Engineering from Villanova University. He holds two patents related to wireless technology. Rick is an advocate for ongoing education and career development both at Comcast and across the Philadelphia tech community and is an active member of Drexel University’s College of Computing Executive Advisory Council.