Data Science Day 2021: Abstracts & Biographies

Browse the abstracts and biographies for Columbia faculty members speaking at Data Science Day 2021.

Held on April 21, 2021 (10:00 AM – 1:00 PM EDT)

Watch Columbia faculty-led lightning talks on YouTube here.

Human + Machine: A New Hybrid World

Data science is an ever-evolving and expanding field. Here, we explore ways in which it has become an integral part of the decision-making and optimization of countless fields, including patient care, B2B, and design.

Participating Speakers:

Oded Netzer
Arthur J. Samberg Professor of Business, Columbia Business School

Talk Title: Salespeople Automation: A Human-Machine Hybrid Approach

Abstract: In a world advancing towards automation, we propose a human-machine hybrid approach to automating decision making in high human interaction environments and apply it in the business-to-business (B2B) retail context. Using sales transactions data from a B2B retailer, we create an automated version of each salesperson, that learns and automatically reapplies the salesperson’s pricing policy. We conduct a field experiment with the B2B retailer, providing salespeople with their own model’s price recommendations in real-time through the retailer’s CRM system, and allowing them to adjust their original pricing accordingly. We find that despite the loss of non-codeable information available to the salesperson but not to the model, providing the model’s price to the salesperson increases profits for treated quotes by 11% relatively to a control condition. Using a counterfactual analysis, we show that while in most of the cases the model’s pricing leads to higher profitability by eliminating inter-temporal human biases, the salesperson generates higher profits when pricing special quotes with unique or complex characteristics. Accordingly, we propose a machine learning hybrid pricing strategy, that automatically allocates quotes to the model or to the human expert and generates profits significantly higher than either the model or the salespeople.

Bio: Professor Netzer’s expertise centers on one of the major business challenges of the data-rich environment: developing quantitative methods that leverage data to gain a deeper understanding of customer behavior and guide firms’ decisions. He focuses primarily on building statistical and econometric models to measure consumer preferences and understand how customer choices change over time, and across contexts. Most notably, he has developed a framework for managing firms’ customer bases through dynamic segmentation. More recently, his research focuses on leveraging text-mining techniques for business applications. See full details.

Lydia Chilton
Assistant Professor of Computer Science, Columbia Engineering

Talk Title: AI Tools for Design and Innovation

Abstract: How can computational tools and AI help people be better at innovation and creative problem-solving? When solving a problem, people have the tendency to fixate on one problem or solution. If that one idea doesn’t work, they get stuck. To avoid getting stuck, the design process encourages people to have multiple ideas, and explore the space of possibilities before deciding on a problem or a solution. Although this works, it’s highly complex- requiring people to follow many threads at once. We show how AI and other computational tools can help simplify and speed up the most cognitively taxing aspects of the design process:

Collecting multiple partial solutions
Synthesizing partial solution into multiple prototypes
Quickly iterating on prototypes to produce an MVP

Bio: Lydia Chilton’s area of study is human-computer interaction with a focus on computational design, including viewing the design process from a computational standpoint. Two current projects are constructing visual metaphors for creative ads and using computational tools to write humor and news satire.Chilton received her PhD from the University of Washington in 2015. She received her Master’s in Engineering from MIT in 2009 and her SB In 2007, also from MIT. Prior to joining Columbia Engineering in 2017, she was a postdoctoral student at Stanford University. See full details.

Sarah Rossetti
Assistant Professor of Biomedical Informatics and Nursing, Columbia University

Talk Title: Exploiting the Signal Gain of Clinician Expertise in a Predictive Early Warning Score and CDS tool using Nursing EHR data

Abstract: Signals of clinical expertise and knowledge-driven behaviors within EHRs can be exploited to enhance predictive model performance, while increasing interpretability. The scientific premise of the CONCERN study is that while clinicians strive to provide the best care, there is a systematic problem within hospital settings of non-optimal communication between nurses and physicians leading to care delays for at-risk patients. The CONCERN model uses novel signals from nursing documentation, including natural language processing of notes, that are proxies of a nurse’s concern to predict patients at risk of deterioration. Preliminary findings include improved performance and lead time compared to leading early warning scores. Our sharable, standards-based, user-centered clinical decision support CONCERN SmartApp surfaces nurses’ concerns to the interprofessional care team and is being evaluated in a clinical trial across two large academic medical centers to decrease patient deterioration.

Bio: Sarah Collins Rossetti, RN, PhD is an Assistant Professor of Biomedical Informatics and Nursing at Columbia University. Prior to her appointment at Columbia she was a Senior Nurse and Clinical Informatician at Brigham and Women’s Hospital in the Department of Medicine Division of Internal Medicine and Primary Care and an Instructor in Medicine at Harvard Medical School. Her research is focused on identifying and intervening on system-level weaknesses – particularly those related to poor communication and care coordination – that increase patient risk for harm within our healthcare system by applying computation tools to mine and extract value from electronic health record (EHR) data and leveraging user-centered design of patient-centered and collaborative decision support tools. See full details.

Courtney D. Cogburn
Associate Professor of Social Work, Columbia School of Social Work (Moderator)

Bio: Associate Professor Courtney D. Cogburn employs a transdisciplinary approach to examining the role of racism in the production of racial inequalities in health. She is on the faculty of the Columbia Population Research Center and a faculty affiliate of the Center on African American Politics and Society and the Data Science Institute. The National Institutes of Health, the Robert Wood Johnson Foundation, and the Brown Institute for Media Innovation at the Columbia School of Journalism have supported her work.

Dr. Cogburn is interested in the ways we characterize and measure racism and the effects of racism on racial inequalities in health. She has focused on examining the effects of cultural racism in the media on acute physiological, psychological, and behavioral stress responses as well as associations between chronic psychosocial stress exposure and Black/White disparities in cardiovascular health and disease. She is also developing a project using data science to explore links between media-based racism and population health. Dr. Cogburn is the lead creator of 1000 Cut Journey, an immersive virtual reality racism experience that was developed in collaboration with the Virtual Human Interaction Lab at Stanford University and which premiered at the Tribeca Film Festival in 2018. The team is now exploring the use of the VR experience in affecting empathy, racial bias, structural competence and behavior. See full details.

Cause, Learn, Optimize, and Reason

This session will highlight advancements in data science, bringing to light causation as opposed to correlation, the use of transfer learning for improving imperfect data, optimization for the improvement of graph problems, and reason to improve differential prediction.

Participating Speakers:

Samory Kpotufe
Associate Professor, Department of Statistics, Columbia University

Talk Title: Big but Imperfect Data: Fundamental Challenges of Domain Adaptation

Abstract: In many ML applications such as healthcare, IoT, finance, perfect representative data is hard to obtain. However much data from related sources is often available, although not adequately representative of the target application. As such, many so-called ‘domain adaptation’ approaches have been developed to harness such large but imperfect data, often with a remarkable degree of success. However, a unified understanding of how and when such imperfect data can help remains elusive, making it hard to build upon previous successes. I’ll attempt to highlight key challenges and promising directions in this problem domain.

Bio: Samory Kpotufe graduated (Sept 2010) in Computer Science at the University of California, San Diego, advised by Sanjoy Dasgupta.He was a researcher at the Max Planck Institute for Intelligent Systems. At the MPI, he worked in the department of Bernhard Schoelkopf, in the learning theory group of Ulrike von Luxburg. Following this, he spent a couple years as an Assistant Research Professor at the Toyota Technological Institute at Chicago; and 4 years at ORFE, Princeton University as an Assistant Professor. Recently, he was a visiting member at the Institute of Advanced Study from January to July 2020. He works in statistical machine learning, with an emphasis on common nonparametric methods (e.g., kNN, trees, kernel averaging). His research interests include adaptivity, i.e., how to automatically leverage beneficial aspects of data as opposed to designing specifically for each scenario. This involves characterizing statistical limits, under modern computational and data constraints, and identifying favorable aspects of data that help circumvent these limits. See full details.

Elias Bareinboim
Associate Professor, Department of Computer Science, Columbia University

Talk Title: Causal Data Science

Abstract: Causal inference provides a set of tools and principles that allows one to combine data and structural invariances about the environment to reason about questions of counterfactual nature — i.e., what would have happened had reality been different, even when no data about this imagined reality is available. Reinforcement Learning is concerned with efficiently finding a policy that optimizes a specific function (e.g., reward, regret) in interactive and uncertain environments. These two disciplines have evolved independently and with virtually no interaction between them. In reality, however, they operate over different aspects of the same building block, i.e., counterfactual relations, which makes them umbilically tied.

Bio: Elias Bareinboim is associate professor in the Department of Computer Science and the director of the Causal Artificial Intelligence Lab at Columbia University. He obtained his Ph.D. in Computer Science at the University of California, Los Angeles, advised by Judea Pearl. He is broadly interested in Artificial Intelligence, Machine Learning, Statistics, Robotics, Cognitive Science, and Philosophy of Science. His research focuses on causal inference and its applications to data-driven fields (i.e., data science) in the health and social sciences as well as artificial intelligence and machine learning. He is particularly interested in understanding how to make robust and generalizable causal and counterfactual claims in the context of heterogeneous and biased data collections, including due to issues of confounding bias, selection bias, and external validity (transportability). A survey of recent developments on this topic, when combining massive sets of research data, appeared at the Proceedings of the National Academy of Sciences (PNAS). See full details.

Clifford Stein
Professor, Industrial Engineering, Operations Research and Computer Science, Columbia Engineering

Talk Title: Parallel Algorithms for Massive Graphs

Abstract: Large graphs model many important problems in data science. When the graph is too large to fit in the memory of one computer, standard sequential algorithms do not work, or are so slow as to be useless. We will survey some recent progress on efficient parallel algorithms whose performance scales nicely with the size of the graph for many of the well-known basic graph problems such as connectivity, spanning trees, shortest paths and matchings.

Bio: Clifford Stein is a Professor of IEOR and of Computer Science at Columbia University. He is also the Associate Director for Research in the Data Science Institute. From 2008-2013, he was chair of the IEOR department. Prior to joining Columbia, he spent 9 years as an Assistant and Associate Professor in the Dartmouth College Department of Computer Science. His research interests include the design and analysis of algorithms, combinatorial optimization, operations research, network algorithms, scheduling, algorithm engineering and computational biology. Professor Stein has published many influential papers in the leading conferences and journals in his field, and has occupied a variety of editorial positions including the journals ACM Transactions on Algorithms, Mathematical Programming, Journal of Algorithms, SIAM Journal on Discrete Mathematics and Operations Research Letters. His work has been supported by the National Science Foundation and Sloan Foundation. See full details.

Melanie M. Wall
Professor of Biostatistics (in Psychiatry), Department of Biostatistics, Mailman School of Public Health, Columbia University

Talk Title: Data Science as the Engine for a Learning Health Care Service System for First Episode Psychosis in Coordinated Specialty Care

Abstract: A key initiative in research focused on treatment for first episode psychosis (FEP) is improving the implementation of evidence-based coordinated specialty care (CSC). One area of improvement is expected to come from improved data analytics facilitated by linking different clinical sites through common data elements and a unified informatics approach for aggregating and analyzing patient level data. Through an NIMH funded network and partnerships with the New York Office of Mental Health and the Columbia department of Psychiatry, data science is contributing to a learning health care model. A few examples will be presented including to what extent predictive modeling of patient-level outcomes based on background variables collected at intake and throughout care can be used to differentiate individuals in a way that is useful. Presentation of results will focus on interpretability of differential prediction across sites and usefulness for facilitating service decisions.

Bio: Melanie Wall is the director of Mental Health Data Science in the New York State Psychiatric Institute (NYSPI) and Columbia University psychiatry department where she oversees a team of 14 biostatisticians collaborating on predominately NIH funded research projects related to psychiatry. She has worked extensively with modeling complex multilevel and multimodal data on a wide array of psychosocial public health and psychiatric research questions in both clinical studies and large epidemiologic studies (over 300 total journal publications). Her biostatistical expertise includes latent variable modeling (e.g. factor analysis, item response theory, latent class models, structural equation modeling), spatial data modeling (e.g. disease mapping), and longitudinal data analysis including the class of longitudinal models commonly called growth curve mixture models. She received a Ph.D. (1998) from the Department of Statistics at Iowa State University, and a B.S. (1993) in mathematics from Truman State University. Before moving to Columbia University in 2010, she was on faculty in Biostatistics in the School of Public Health at the University of Minnesota. See full details.

Martha Kim
Associate Professor, Computer Science, Columbia University
(Moderator)

Bio: Martha Kim is an Associate Professor of Computer Science at Columbia University where she leads the ARCADE Lab. Kim’s research interests are in computer architecture, parallel programming, compilers, and low-power computing. Her work has explored low-cost chip manufacturing systems, reconfigurable communication networks, and fine-grained parallel application profiling techniques. Her current research focuses on hardware and software techniques to improve the usability of hardware accelerators as well as data-centric accelerator design. Kim holds a PhD in Computer Science and Engineering from the University of Washington and a bachelors in Computer Science from Harvard University. She is the recipient of the 2013 Rodriguez Family Award in recognition of the research achievements of underrepresented junior faculty and a 2013 NSF CAREER award. See full details.

Data Science Day 2021: Abstracts & Biographies

Held on April 21, 2021 (10:00 AM – 1:00 PM EDT)

Human + Machine: A New Hybrid World

Participating Speakers:

Cause, Learn, Optimize, and Reason

Participating Speakers:

Alexis Avedisian

Jessica Rodriguez