Wednesday, April 2, 20258:00 am - 5:00 pm
Alfred Lerner Hall, Columbia University Morningside CampusAddress: 2920 Broadway, New York, NY 10027
The Data Science Institute’s flagship annual event connects innovators in industry and government to Columbia researchers who are propelling advances across every sector with data science. The 2025 event will feature a keynote presentation from Rick Rioboli, Executive Vice President and Chief Technology Officer, Comcast Connectivity and Platforms; three sessions of Columbia-led lightning talks; and 100+ interactive posters and technology demonstrations.
Register
Please plan to arrive early to check-in, have breakfast, and find your seat.
Garud Iyengar, Avanessians Director of the Data Science Institute and Professor of Industrial Engineering and Operations Research, will be the master of ceremonies.
AI is entering a new phase of innovation, with emerging technologies reshaping its capabilities. Small Language Models are offering more efficient and specialized alternatives to large AI systems, while causal AI is transforming decision-making by uncovering cause-and-effect relationships. As AI adoption accelerates, researchers are also focusing on energy-efficient computing for sustainability and embodied intelligence for enhanced human-AI interaction. The Future of Data Science and AI will examine how these advancements are driving AI toward greater adaptability, efficiency, and impact across industries.
Speakers:
Rick Rioboli, Executive Vice President and Chief Technology Officer, Comcast Connectivity and Platforms
Rick Rioboli has over 25 years of experience leading large-scale technology teams, bringing innovative customer-facing products to market. He has led technology transformations in customer experience, AI, and data privacy, earning three Technology and Engineering Emmys.
Talk: AI in Large Enterprises… Crossing the Value Chasm
As AI technology continues to evolve at a breakneck pace, large enterprises are scrambling to leverage this technology to transform their businesses. In this talk, Rick Rioboli will explore the reasons behind the gap between AI’s potential and its actual realized value in large enterprises. He will also discuss which types of business problems are best suited to be solved with current AI capabilities, the forms of AI being used to address these challenges, and some examples of where AI is being used in large scale production. Finally, he will assess where we need advances in technology to drive wider scale adoption in enterprises and generate real business transformational value.
Effective data science is vital for strengthening cybersecurity and maintaining digital integrity. As cyberattacks become more sophisticated, researchers and engineers are advancing methods to detect and counter emerging risks like deep fakes and data manipulation. Secure Data Science: Detect, Defend, Deter explores data-driven security strategies, responsible data-sharing frameworks, and innovative approaches to bolstering digital trust and resilience in an evolving cybersecurity landscape.
Advancements in consumer health and wellness are reshaping how individuals monitor and manage their well-being. Sophisticated wearables and diagnostic tools are making personalized healthcare, predictive analytics, and early disease detection more accessible. The Science of Self Care: Innovations in Consumer Health explores the evolution of tailored health strategies and the integration of technology into daily wellness routines, highlighting the growing impact of data-driven insights on proactive health management.
Meet and network with Columbia researchers who are shaping the future of data science as they showcase the next generation of methods and applications. Explore 100+ projects spanning diverse disciplines, including business, healthcare, cybersecurity, technology, energy, climate science, public policy, and the arts.
Vishal Misra, Professor of Computer Science; Vice Dean of Computing and Artificial Intelligence, Columbia Engineering
Title: LLMs are Bayesian Learners
Abstract: Recent breakthroughs in Large Language Models (LLMs) reveal a surprising capacity for in-context learning; given just a few prompt examples, LLMs can adapt to new tasks or domains. In this talk, Misra will present a unifying perspective that interprets LLMs as approximate Bayesian learners. First, the talk revisits the “Beyond the Black Box” framework, which views LLMs as approximating an enormous matrix that maps each prompt to a next-token distribution—effectively performing Bayesian updates as they consume new context. Then, the talk will turn to a more local geometric perspective—“The Bayesian Geometry of LLMs”—showing how cross-entropy training shapes key-query vectors to act like posterior probabilities in the attention mechanism. By examining phenomena such as code completion, domain-specific language prompts, and chain-of-thought reasoning, Misra illustrates how strongly predictive “subsequence snippets” dominate the LLM’s attention and override generic prior knowledge. He concludes by discussing implications for prompt engineering, hallucination control, and future directions in retrieval-augmented or alternative LLM architectures. Overall, this talk bridges two complementary viewpoints, providing a robust explanation for how LLMs transition so effortlessly from generic completions to specialized in-context behavior.
Smaranda Muresan, Associate Professor of Computer Science, Barnard College
Title: Human-centric Natural Language Processing for Social Good
Abstract: Large language models (LLMs) constitute a paradigm shift in Natural Language Processing (NLP) and its applications across all domains. To move towards human-centric NLP designed for social good, this talk argues that we need knowledge-aware NLP systems and human-AI collaboration frameworks. NLP systems that interact with humans need to be knowledge-aware (e.g., commonsense, sociocultural norms) and context-aware (e.g., social, perceptual) so that they communicate better and more safely with humans. Moreover, NLP systems should be able to collaborate with humans to create high-quality datasets for training and/or evaluating NLP models, to help humans solve tasks, and ultimately to align better with human values. In this talk, Muresan gives a brief overview of her lab’s research around NLP for social good, such as NLP for public health, creativity support and building NLP technologies with language and culture diversity in mind.
Matei Ciocarlie, Associate Professor of Mechanical Engineering, Columbia Engineering
Title: Robotics and Embodied Intelligence: Is the Age of the General-Purpose Robot Around the Corner? Abstract: In recent years, motor control methods based on machine learning have revolutionized the field of robotics, from locomotion to manipulation. Building on this change, large vision-language-action models now promise to provide the final missing link, giving robots the semantic intelligence needed to operate in ever-changing, human environments. Does this mean that the general-purpose robot, a long-held aspiration of the field that for decades seemed to remain out of reach indefinitely, is suddenly about to become a reality?
David Sandalow, Inaugural Fellow at the Center on Global Energy Policy (CGEP); and Co-Director of the Energy and Environment Concentration, School of International and Public Affairs
Title: AI and climate change mitigation Abstract: AI has significant potential to help reduce greenhouse gas emissions. ML tools are improving the productivity of solar power plants and energy efficiency of buildings. AI can help accelerate materials innovation, with potentially transformational benefits for energy storage, wind power and nuclear power. Other examples abound. At the same time, AI poses significant risks with respect to climate change. We explore current estimates of the greenhouse gas impacts of AI, the potential for AI to contribute to climate change mitigation and climate risks posed by AI. We conclude with recommendations for maximizing the potential benefits and minimizing the risks of AI for climate change mitigation.
Junfeng Yang, Professor of Computer Science, Columbia Engineering
Title: Robust Deepfake Detection via Rewriting
Abstract: Abstract: Generative AI supercharges productivity but also fuels abuse, from phishing and propaganda to scams like $25M deepfake video call heist and voice-mimicking kidnapping ploys. Current detectors for AI-generated content often target specific models or datasets, faltering with poor accuracy when the model or data distribution shifts. Yang’s talk presents Raidar, a simple, robust detection method. Raidar takes input content x, prompts another AI to rewrite it as output y, and flags x as AI-generated if the edits between x and y are minimal. Raidar is agnostic to the generative model or process, so it is inherently robust on new content. Extensive evaluation shows that Raidar improves detection accuracy by up to 26%, remaining robust across diverse datasets and models, including the toughest adversarial attack. Yang’s follow-up work extends Raidar’s approach to deepfake voice and video. Results illustrate the unique imprint of machine-generated content through the lens of the machines themselves.
Rebecca Wright, Druckenmiller Professor and Chair of Computer Science, Director of the Vagelos Computational Science Center, Barnard College
Title: Accountability in Computing
Abstract: With the increased reliance on computing systems and AI-based decision making, accountability in computing is more important than ever. While “accountability” is used often in describing computer-security mechanisms that complement preventive security, it lacks a precise, agreed-upon definition. This talk will discuss the need for accountability in computing and some of the many ways that the term is used. We will also explore potential tradeoffs between accountability and other potentially desirable properties, such as privacy.
Jason Healey, Senior Research Scholar, School of International and Public Affairs
Title: Is Cyber Defense Winning? Abstract: Based on the National Cybersecurity Strategy, the US government is working to rebalance “the advantage to [cyber] defenders and perpetually frustrating the forces that would threaten it” through “fundamental changes to the underlying dynamics of the digital ecosystem.” But how will defenders – flooded with ambiguous and context-free data – know if they are shifting the balance? For example, “Google observed 97 zero-day vulnerabilities … in 2023, over 50 percent more than 2022.” Does such a large increase of zero-days, which only become known to defenders when they are used as part of an attack, mean defenders are doing better, or worse? Without a larger framework, there is no way to know. This talk summarizes the first such framework, which categorizes, from the vast universe of metrics, the very few which shed light on the system-wide offense-defense competition for advantage, so that determining success is as data driven as possible.
Noémie Elhadad, Associate Professor of Biomedical Informatics; and Chair, Department of Biomedical informatics, Vagelos College of Physicians and Surgeons
Title: AI for Endometriosis: Enhancing Detection, Management, and Self-Care
Abstract: Endometriosis is one of several women’s health conditions that remain underdiagnosed and lack clear guidance for care, leaving millions to navigate their symptoms with little support. AI has the potential to bridge these gaps by improving detection, characterization, and self-management strategies through patient-driven and real-world data. This talk will highlight the Citizen Endo project, including the Phendo app, as a model to capture the lived experiences of patients, generate new insights about this disease, and help patients manage their symptoms. By focusing on endometriosis, Elhadad explores how AI can transform the landscape of women’s health more broadly, ensuring that under-researched conditions receive the attention they deserve.
Carri Chan, John A. Howard Professor of Business, Columbia Business School
Title: Data-Driven Transformations for Access, Costs, and Care: Prediction-Driven Staffing in the Emergency Department Abstract: The ever-growing demand for healthcare services strains existing resources, including skilled professionals, infrastructure, and time. Artificial intelligence offers a potential path to address these critical gaps. Chan discusses how predictive models of patient demand that use a combination of historical data and real-time information can be leveraged to optimize nurse staffing at hospitals, resulting in improved Emergency Department efficiency, accessibility, and cost savings. The talk examines the implementation of AI-driven staffing solutions and discusses their impact on key healthcare metrics.
Drago Plecko, Postdoctoral Research Scientist in the Department of Computer Science, Columbia Engineering
Title: Causal Fairness Analysis for Causal Health Equity
Abstract: In this talk, Plecko describes the three basic tasks of Causal Fairness Analysis: (i) bias detection, (ii) fair prediction, and (iii) fair decision-making. For each task, he also mentions relevant real-world examples in an attempt to build a catalog of different fairness settings. Plecko also describes how Causal Fairness Analysis can be used to explain racial and ethnic disparities following admission to an intensive care unit (ICU). The analysis reveals that minority patients are much more likely to be admitted to the ICU, and that this increase in admission is linked with lack of access to primary care. This led Plecko to construct the Indigenous Intensive Care Equity (IICE) Radar, a monitoring system for tracking the over-utilization of ICU resources by the Indigenous population of Australia across geographical areas, opening the door for targeted public health interventions aimed at improving health equity.