Data Science Day, 2019 - Lightning Talks & Abstracts

Lightning Talk I: Data Science Foundations: Today & Tomorrow

Michael Collins, Vikram S. Pandit Professor of Computer Science
“Successes and Challenges in Neural Models for Speech and Language”

In recent years there has been dramatic progress in key problems in speech and natural language processing (NLP), largely driven by neural methods. In this talk Collins will describe a sequence of NLP/speech problems and neural architectures of increasing complexity. Collins will detail the successes of these approaches and also the (many) questions that they raise.

Liam Paninski, Professor of Computer Science
“Neural Data Science”

The neural coding problem is perhaps the fundamental question in systems neuroscience. Given some input stimulus or movement, or thought, etc., what is the probability of a neural response? In other words, what is the neural code? Modern multi-neuronal recordings produce single-cell-resolution data on a large scale. Neural data science aims to extract meaning from the resulting huge new streams of data. This lecture will highlight some recent progress and current challenges in this rapidly growing field, where new methods for network analysis, dimensionality reduction, and optimal control — developed in lockstep with advances in experimental neurotechnology — promise breakthroughs in solving multiple fundamental neuroscience problems.

Tim Roughgarden, Professor of Computer Science
“Studying Auctions for Online Advertising and Pricing in Thin Markets”

Auctions for online advertising power the business models of many big tech companies such as Google and Facebook. How should such auctions set prices for ads? This problem is particularly challenging in thin markets with a relatively small number of competitors. Professor Tim Roughgarden will discuss research on data-driven approaches to meeting these challenges.

Lightning Talk II: How AI is Changing Industry

Simona Abis, Assistant Professor of Business
“Man + Machine: The Future of Labor and Knowledge Production”

Technological advancement has always been at the core of the innovation and development of most industries. From the industrial revolution to our days this has been strongly intertwined with the demand for labor and the skills required from the labor force. The current technological disruption, due to the advancements in AI and computing power, is no different. In order to understand the economic implications of these advancements, we must take into account the profit maximizing motives of firms and how these technologies might change their needs, incentives, and decision-making process.

Nima Mesgarani, Associate Professor of Electrical Engineering
“Brain-controlled Assistive Hearing Technologies: Challenges and Opportunities”

Listening in noisy and crowded environments is exceptionally challenging for hearing-impaired listeners. Assistive-hearing devices can suppress certain types of background noise, but they cannot help a user attend to a single conversation amongst many without knowing which person is speaking. Recent advances in scientific discoveries of speech processing in the human auditory cortex have motivated several new paths to enhance the efficacy of hearable technologies. These possibilities include speech neuroprosthesis, which aims to establish a direct communication channel with the brain, auditory attention decoding – where the similarity of a listener’s brainwave to the sources in the acoustic scene is used to identify the attended source, and increased speech perception using electrical brain stimulation. In parallel, the field of speech signal processing has recently seen tremendous progress due to the emergence of deep learning models, where even solving the “cocktail party problem” is no longer out of reach. Nima Mesgarani, will discuss the recent efforts in bringing together the latest progress in brain-computer interfaces and speech processing technologies to design and actualize the next generation of assistive hearing devices, with the potential to augment speech communication in realistic and challenging acoustic conditions.

Julian Nyarko: Postdoctoral Research Scholar in the Faculty of Law
“Corporate Climate: Using Machine Learning to Assess Climate Risk Disclosures and Susceptibility”

The risks associated with climate change are becoming increasingly relevant to investors. However, while the Security Exchange Commission mandates the disclosure of climate risks by public registrants, whether these companies actually make adequate disclosures has been difficult to verify. We leverage recent advancements in text analysis and machine learning to identify climate risk disclosures in corporate filings. We then create an objective framework for assessing which companies should be making these disclosures. By comparing companies that are informing about climate change related risks to those that should be, we are able to gain insights into the effectiveness of the current regulatory framework.

Lightning Talk III: A Private, Secure, & Safe World

Ronghui Gu, Assistant Professor of Computer Science
“Towards Building Trustworthy Blockchain Ecosystems ”

Blockchain ecosystems are built based on trust. Some people call it a “consensus,” some people call it a “belief.” However, the codes written to implement such blockchain ecosystem are not trustworthy due to program bugs. Gu’s work is focused on making software systems reliable and secure through the use of a mathematical model known as formal verification. As the backbone of modern software systems, operating system (OS) kernels impact the reliability and security of today's computing hosts. OS kernels, however, are complicated and prone to bugs. In the past several years, Gu has designed and developed CertiKOS, the world's first formally verified, concurrent OS kernel, proven to be bug-free and hacker-resistant. Gu uses CertiKOS and applies formal verification techniques to build trustworthy software that has applications to many technologies including blockchain systems. He will discuss why his research is considered a significant scientific breakthrough as well as a giant leap for blockchain technology.

Mark Hansen, David and Helen Gurley Brown Professor of Journalism and Innovation; Director David and Helen Gurley Brown Institute of Media Innovation
“To Reduce Privacy Risks, the Census Plans to Report Less Accurate Data”

When the Census Bureau gathered data in 2010, it made two promises. The form would be “quick and easy,” and, “your answers are protected by law.” But mathematical breakthroughs, easy access to more powerful computing, and widespread availability of large and varied public data sets have made the bureau reconsider whether the protection it offers Americans is strong enough. The Census Bureau has decided to enforce stronger privacy protections than companies like Apple or Google had when they each first took up differential privacy. To preserve confidentiality, the bureau’s directors have determined they need to adopt a “formal privacy” approach, one that adds uncertainty to census data before it is published and achieves privacy assurances that are provable mathematically. Guaranteeing people’s confidentiality is critical and increasingly challenging, but some scholars worry that the new system will impede research. Hansen will discuss the pros and cons from both perspectives.

Tamar Mitts, Assistant Professor of International and Public Affairs
“Global Radicalization in an Internet Age”

Between 2011 and 2016, the Islamic State successfully convinced tens of thousands of individuals around the world to join its ranks. Many attribute this surge in foreign recruits to sophisticated internet media campaigns developed by the group since 2011. Yet, there is currently very little empirical analysis of what was ‘marketed’ in ISIS’s propaganda, what messages resonated with potential recruits, and what types of content were more likely to radicalize. Employing information on network connections, we find that propaganda messages relating to grievances, ideology, and the material and social desires of potential recruits were highly effective at increasing online support for ISIS. Strikingly, however, we find that these messages became largely ineffective when propaganda included brutal violent scenes. These findings suggest that what attracted individuals to ISIS was not the violent content that made the group so famous, but the messages in its propaganda that conveyed the material and spiritual benefits of recruitment.

Lightning Talk IV: Improving Patient Outcomes Through Data Science

Andrea Baccarelli, Leon Hess Professor of Environmental Health Sciences; Chair, Department of Environmental Health Sciences
“Data Science and Epigenomics – Solving 21st Century Public Health Challenges”

Epigenomics is the study of the programming and changes in gene expression that does not depend on the DNA sequence. Remarkably, the human epigenome is a flexible, environmentally sensitive component of human biology that changes over time. This has been used in the field, including in our lab, in the attempt to develop new biosensors of environmental exposures and lifestyle. We have been mining epigenomics data to develop algorithms that can reveal someone’s (“true”) biological age, as well as to predict whether someone is a smoker or not, and if a smoker, how many cigarettes they smoked during their lifetime. Baccarelli will present possible applications: for instance, he has developed a biosensor of exposure to toxic lead, which can estimate with a single drop of blood lifetime exposures to lead. Baccarelli will also discuss how in the future data science, coupled with molecular biology, can open new public health and commercial opportunities.

Carri W. Chan, Associate Professor of Business
“An Examination of Early Transfers to the ICU Based on a Physiologic Risk Score”

Unplanned transfers of patients from general medical-surgical wards to the Intensive Care Unit (ICU) can occur due to unexpected patient deterioration. Such patients tend to have higher mortality rates and longer lengths-of-stay than direct admissions to the ICU. As such, the medical community has invested substantial efforts in the development of patient-risk scores with the intent to identify patients at risk of deterioration. In this work, Chan considers how one such risk score could be used to trigger proactive transfers to the ICU. Chan utilizes a retrospective dataset from 21 Kaiser Permanente Northern California hospitals to estimate the potential benefit of transferring patients to the ICU at various levels of patient risk of deterioration. In order to reduce the sensitivity of the findings to key identification and modeling assumptions, she uses a combination of multivariate matching and instrumental variable approaches. Using the empirical results to calibrate a simulation model, it was found that proactively transferring the most severe patients could reduce mortality rates and lengths-of-stay without increasing other adverse events; however, proactive transfers should be used judiciously as being too aggressive could increase ICU congestion and degrade quality of care.

George M. Hripcsak, Vivian Beaumont Allen Professor of Biomedical Informatics; Chair, Department of Biomedical Informatics
“Steering Medical Therapy Through Large-Scale Clinical Data”

Doctors frequently have questions about what is the best drug to use. Or what side effects might appear from that drug. Or whether prescribing two drugs together will cause a problem. Yet, the vast majority of questions like these have gone unanswered. Today, however, medical records and insurance data make it possible to answer these questions, with the chance that it is possible to get the wrong answer. If, for example, healthier patients take one drug compared to another, then the first drug may appear to work better. The Observational Health Data Sciences and Informatics (OHDSI) initiative applies advanced data-science techniques to avoid such errors. OHDSI is an interdisciplinary and international collaborative with a coordinating center at Columbia University. With half a billion patient records, OHDSI conducts federated studies at sufficient scale to answer questions about diagnosis and treatment. This talk will illustrate OHDSI’s approach and discuss how its studies have provided significant insights on treatment pathways for chronic diseases around the world.


550 W. 120th St., Northwest Corner 1401, New York, NY 10027    212-854-5660
©2019 Columbia University