Data Science Day 2021

Wednesday, April 21, 2021

Data Science Day provides a forum for innovators in academia, industry, and government to connect. The April 21, 2021 virtual event featured two sessions of lightning talks from leading Columbia University faculty members; interactive demonstrations and posters; and a keynote address from Pat Bajari, a Chief Economist at Amazon and Vice President of Amazon’s Core AI team.

Event Stats

  • 760+ live viewers
  • 1,160+ virtual “check-ins” to the Data Science Day attendee website
  • 2,500+ views of the Data Science Day program within 24 hour period
  • 7,765 website visits to Data Science Day attendee website between April 20 – April 22, 2021

Read our Recap

All speakers and their respected roles/titles are accurate to time of the event (2021)

2021 Keynote Speaker

Pat Bajari

Pat Bajari, Chief Economist and Vice President, Amazon Core AI

Pat Bajari is a Chief Economist at Amazon and Vice President of Amazon’s Core AI team. His team of software engineers and scientists in machine learning, statistics, operations research, and econometrics has helped to build scalable systems for supply chain, transportation, pricing, automated marketing, robotics, forecasting, human resources, and more. Prior to joining Amazon, he was a full time faculty member in economics at Harvard, Stanford, Duke and Minnesota.

2021 Lightning Talks

Human + Machine: A New Hybrid World

Data science is an ever-evolving and expanding field. Here, we explore ways in which it has become an integral part of the decision-making and optimization of countless fields, including patient care, B2B, and design.

Oded Netzer

Oded Netzer
Arthur J. Samberg Professor of Business, Columbia Business School

Talk Title: Salespeople automation: A Human-Machine Hybrid Approach

Abstract: In a world advancing towards automation, we propose a human-machine hybrid approach to automating decision making in high human interaction environments and apply it in the business-to-business (B2B) retail context. Using sales transactions data from a B2B retailer, we create an automated version of each salesperson, that learns and automatically reapplies the salesperson’s pricing policy. We conduct a field experiment with the B2B retailer, providing salespeople with their own model’s price recommendations in real-time through the retailer’s CRM system, and allowing them to adjust their original pricing accordingly. We find that despite the loss of non-codeable information available to the salesperson but not to the model, providing the model’s price to the salesperson increases profits for treated quotes by 11% relatively to a control condition. Using a counterfactual analysis, we show that while in most of the cases the model’s pricing leads to higher profitability by eliminating inter-temporal human biases, the salesperson generates higher profits when pricing special quotes with unique or complex characteristics. Accordingly, we propose a machine learning hybrid pricing strategy, that automatically allocates quotes to the model or to the human expert and generates profits significantly higher than either the model or the salespeople.

Lydia Chilton

Lydia Chilton
Assistant Professor of Computer Science, Columbia Engineering

Talk Title: AI Tools for Design and Innovation

Abstract: How can computational tools and AI help people be better at innovation and creative problem-solving? When solving a problem, people have the tendency to fixate on one problem or solution. If that one idea doesn’t work, they get stuck. To avoid getting stuck, the design process encourages people to have multiple ideas, and explore the space of possibilities before deciding on a problem or a solution. Although this works, it’s highly complex- requiring people to follow many threads at once. We show how AI and other computational tools can help simplify and speed up the most cognitively taxing aspects of the design process: 

  1. Collecting multiple partial solutions
  2. Synthesizing partial solution into multiple prototypes
  3. Quickly iterating on prototypes to produce an MVP 
Sarah Rossetti

Sarah Collins Rossetti
Assistant Professor of Biomedical Informatics and Nursing, Columbia University

Talk Title: Exploiting the Signal Gain of Clinician Expertise in a Predictive Early Warning Score and CDS tool using Nursing EHR data

Abstract: Signals of clinical expertise and knowledge-driven behaviors within EHRs can be exploited to enhance predictive model performance, while increasing interpretability. The scientific premise of the CONCERN study is that while clinicians strive to provide the best care, there is a systematic problem within hospital settings of non-optimal communication between nurses and physicians leading to care delays for at-risk patients. The CONCERN model uses novel signals from nursing documentation, including natural language processing of notes, that are proxies of a nurse’s concern to predict patients at risk of deterioration. Preliminary findings include improved performance and lead time compared to leading early warning scores. Our sharable, standards-based, user-centered clinical decision support CONCERN SmartApp surfaces nurses’ concerns to the interprofessional care team and is being evaluated in a clinical trial across two large academic medical centers to decrease patient deterioration.

Courtney D. Cogburn

Courtney D. Cogburn
Associate Professor of Social Work, Columbia School of Social Work


Cause, Learn, Optimize, and Reason

This session will highlight advancements in data science, bringing to light causation as opposed to correlation, the use of transfer learning for improving imperfect data, optimization for the improvement of graph problems, and reason to improve differential prediction.

Samory Kpotufe

Samory Kpotufe
Associate Professor, Department of Statistics, Columbia University

Talk Title: Big but Imperfect Data: Fundamental Challenges of Domain Adaptation

Abstract: In many ML applications such as healthcare, IoT, finance, perfect representative data is hard to obtain. However much data from related sources is often available, although not adequately representative of the target application. As such, many so-called ‘domain adaptation’ approaches have been developed to harness such large but imperfect data, often with a remarkable degree of success. However, a unified understanding of how and when such imperfect data can help remains elusive, making it hard to build upon previous successes. I’ll attempt to highlight key challenges and promising directions in this problem domain.

Melanie M. Wall

Melanie Wall
Professor of Biostatistics (in Psychiatry), Department of Biostatistics, Mailman School of Public Health, Columbia University

Talk Title: Data Science as the Engine for a Learning Health Care Service System for First Episode Psychosis in Coordinated Specialty Care

Abstract: A key initiative in research focused on treatment for first episode psychosis (FEP) is improving the implementation of evidence-based coordinated specialty care (CSC). One area of improvement is expected to come from improved data analytics facilitated by linking different clinical sites through common data elements and a unified informatics approach for aggregating and analyzing patient level data. Through an NIMH funded network and partnerships with the New York Office of Mental Health and the Columbia department of Psychiatry, data science is contributing to a learning health care model. A few examples will be presented including to what extent predictive modeling of patient-level outcomes based on background variables collected at intake and throughout care can be used to differentiate individuals in a way that is useful. Presentation of results will focus on interpretability of differential prediction across sites and usefulness for facilitating service decisions.

Elias Bareinboim

Elias Bareinboim
Associate Professor, Department of Computer Science, Columbia University

Talk Title: Causal Data Science

Useful links:

Abstract: Causal inference provides a set of tools and principles that allows one to combine data and structural invariances about the environment to reason about questions of counterfactual nature — i.e., what would have happened had reality been different, even when no data about this imagined reality is available. Reinforcement Learning is concerned with efficiently finding a policy that optimizes a specific function (e.g., reward, regret) in interactive and uncertain environments. These two disciplines have evolved independently and with virtually no interaction between them. In reality, however, they operate over different aspects of the same building block, i.e., counterfactual relations, which makes them umbilically tied.

Clifford Stein

Clifford Stein
Professor, Industrial Engineering, Operations Research and Computer Science, Columbia Engineering

Talk Title: Parallel Algorithms for Massive Graphs

Abstract: Large graphs model many important problems in data science. When the graph is too large to fit in the memory of one computer, standard sequential algorithms do not work, or are so slow as to be useless. We will survey some recent progress on efficient parallel algorithms whose performance scales nicely with the size of the graph for many of the well-known basic graph problems such as connectivity, spanning trees, shortest paths and matchings.

Martha Kim

Martha Kim
Associate Professor, Computer Science, Columbia University


60+ Posters and Demos Exhibited

Data Science Day included over 60 research projects from Columbia University faculty, students, and affiliated researchers. Attendees joined small breakout groups to meet and network with researchers who are building the next generation of data science methods and applications. The posters were exhibited based on their affiliation to Data Science Institute’s research center and working group topic areas.

DSI Industry Affiliates have access to Data Science Day posters after the event. If you are a current DSI Industry Affiliate, please contact us at for a link to the videos.


  • DSI Industry Affiliates have access to Data Science Day recordings after the event. If you are a current DSI Industry Affiliate please contact us at for a link to the videos.
  • The keynote address with Pat Bajari was not recorded.

Thank You

DSI Industry Affiliates Program