Data Science Day 2018
Wednesday, March 28, 2018
5:00 am - 12:30 pm
Wednesday, March 28, 2018
5:00 am - 12:30 pm
All speakers and their respected roles/titles are accurate to time of the event (2018)
Diane Greene, CEO, Google Cloud
Data, Insights, and Solution Journeys in the Cloud: This talk aimed to explore the convergence of big data, artificial intelligence, and the cloud.
Biography: After joining Google’s Board in 2012, Greene joined Google full-time in December 2015 as CEO of Google Cloud. Today, Google Cloud is one of the top cloud computing players, leading in data analytics and ML, agile open dev and deployment environments, security, and collaboration tools with G Suite. Prior to Google, Greene co-founded, ran, and sold three successful technology companies: VXtreme, a low bandwidth streaming video company, which was bought by MSFT; VMware, which EMC acquired and Greene took public for a $19.1 billion first-day closing valuation; and Bebop, an enterprise SaaS vendor acquired by Google. Prior to VXtreme Greene worked at two consulting firms as a Naval Architect, ran engineering for Windsurfer International, and worked at Sybase, Tandem, and SGI as a software engineer. Greene is on the board of MIT as a lifetime member of the MIT Corporation and remains on Alphabet’s (formerly Google’s) board. Greene served on the board of Intuit from 2006 through 2017.
Greene’s degrees include, an M.S. in computer science from the University of California, Berkeley, an M.S. in naval architecture from MIT, and a B.S. in mechanical engineering from the University of Vermont. Greene’s recent recognitions include being named to the Bloomberg 50, as well as receiving the Anita Borg Institute Technical Leadership Award, and a University of Vermont Honorary Doctor of Science, all in 2017. Greene is a lifelong sailor and was the 1976 Women’s National Dinghy Champion.
Data and Democracy: A discussion of the implications of digital data on law, policy and democracy. Wing and Bollinger will touch on topics including how platform companies such as Facebook, Twitter, and Google are heightening concerns about fake news and first amendment rights, potential threats to our democracy and the transformation of what democracy means.
Nicholas P. Tatonetti
Herbert Irving Assistant Professor of Biomedical Informatics, Vagelos College of Physicians and Surgeons, Columbia University
Talk Title: Disease Heritability using 7.4 Million Familial Relationships Inferred from EHRs
Abstract: Heritability is essential for understanding the biological causes of disease, but requires laborious patient recruitment and phenotype ascertainment. Electronic health records (EHR) passively capture a wide range of clinically relevant data and provide a novel resource for studying the heritability of traits that are not typically accessible. EHRs contain next-of-kin information collected via patient emergency contact forms, but until now, these data have gone unused in research. We mined emergency contact data at three academic medical centers and identified millions of familial relationships while maintaining patient privacy. Identified relationships were consistent with genetically-derived relatedness. We used EHR data to compute heritability estimates for 500 disease phenotypes. Overall, estimates were consistent with literature and between sites. Inconsistencies were indicative of limitations and opportunities unique to EHR research. These analyses provide a novel validation of the use of EHRs for genetics and disease research.
Jeffrey Shaman
Associate Professor of Environmental Health Sciences; Director, Climate and Health Program, Mailman School of Public Health, Columbia University
Talk Title: Nowcasting and Forecasting Seasonal Influenza
Abstract: In recent years, a variety of methods have been developed to estimate the current and future growth and spread of infectious disease outbreaks. Here I describe some of the computational, mathematical and statistical approaches my research group has used to develop real-time nowcasting and forecasting of seasonal influenza. Assessment of the operational accuracy of these nowcasts and forecasts will also be discussed, as well as ongoing efforts to validate and improve these systems.
Jacqueline Gottlieb
Professor of Neuroscience, Zuckerman Institute, Columbia University
Talk Title: Simplifying an Impossibly Complex World: Lessons from Biological Information Sampling Strategies
Abstract: Biological agents routinely make adaptive decisions in complex environments that they cannot fully comprehend. Faced with an overabundance of information, these agents have evolved a family of mechanisms for ignoring the vast majority of irrelevant inputs and very sparsely sampling relevant cues. Strikingly however, the question of how the brain generates sampling policies – how it determines what to attend to and what to ignore – has been until recently relatively neglected in neuroscience and psychology. I will review recent advances in understanding this question, with a focus on the emerging theme that sampling is not dictated solely by material gains but also by intrinsic factors including the uncertainty, effort and pleasantness of belief states that are expected to be engendered by the information. These cognitive forms of utility can be quantitatively characterized, and bring important insights into how intelligent agents cope with complex environments using potentially imperfect, yet efficient, question-answer strategies.
Suzanne R. Bakken
Professor of Biomedical Informatics, Vagelos College of Physicians and Surgeons, Columbia University
(Moderator)
Wolfram Schlenker
Professor of International & Public Affairs, Columbia SIPA
Talk Title: Agricultural Yields and Prices in a Warming World
Abstract: There is a strong nonlinear relationship between corn / soybean yields and temperature: yields increase in temperature up to roughly 30C (86F), when future temperature increases become harmful. The slope of the decline above the optimum is significantly steeper than the incline below it. Climate change has the potential to significantly decrease yields: the beneficial effect of shifting colder temperatures towards the moderate optimum is more than offset by the harmful effect of shifting moderate temperatures towards hotter temperatures. Changes in yields directly influence agricultural commodity prices, which are linked between periods through storage. One third of reductions in agricultural production due to weather or biofuel mandates get offset by future supply increases, while the other two thirds come from reductions in demand.
Lisa Goddard
Director of International Research Institute for Climate and Society
Talk Title: Data & Finance in the Developing World
Abstract: Innovations in the creation and use of data in the developing world can help bring smallholder farmers out of poverty traps, and allow humanitarian organizations to move populations from crisis mode to risk management. This presentation overviews two examples in which enhanced observational datasets, seasonal orecasts with reliable, quantified uncertainty, and cost-benefit analysis have been applied through collaboration with communities and decision makers to lessen climate shocks to the agricultural sector and make humanitarian aid go further. The first example is climate-based index insurance for smallholder farmers in Africa. The second example is forecast-based financing for the World Food Programme. The approaches can be complementary and also would be relevant to other sectors and geographies.
Geoffrey Heal
Donald C. Waite III Professor of Social Enterprise in the Faculty of Business; Professor of International & Public Affairs, Columbia Business School
Talk Title: Rising Waters: The Economic Impact of Sea Level Rise
Abstract: By the end of this century sea level may have risen by anywhere between 65cm and 5m. The exact number depends on which models we use and what we assume about mitigation of greenhouse gas emissions. Even increases at the lower end of this range will have far-reaching economic consequences. I describe a project that is modeling these consequences. We bring together a database of residential property transactions in the US since 1980, flood risk maps, flood insurance premium data, LIDAR elevation data and data on attitudes towards climate change to assess whether exposure to the risk of flooding by sea level rise is affecting property prices. We model the behavior of property buyers in the face of flood risk and test this model on the residential property transaction database. We also evaluate the risks to coastal infrastructure associated with sea level rise.
Garud N. Iyengar
Tang Family Professor of Industrial Engineering and Operations Research, Columbia Engineering
(Moderator)
Junfeng Yang
Associate Professor of Computer Science, Columbia Engineering
Talk Title: Effective Testing and Verification of Deep Learning Systems
Abstract: Machine Learning (ML) has made tremendous progress in recent years, achieving or surpassing human-level performance for a diverse set of tasks including image classification, speech recognition, and game playing such as Go. These advances have led to widespread adoption of ML in security- and safety-critical systems such as self-driving cars, malware detection, and aircraft collision avoidance systems. Unfortunately, ML systems, despite their impressive capabilities, often demonstrate unexpected or incorrect behaviors on corner-case inputs, leading to disastrous consequences such as fatal collisions of self-driving cars. In this talk, I’ll present some of our initial research towards testing and verifying the robustness of ML systems.
Joshua Mitts
Associate Professor of Law, Columbia Law School
Talk Title: Informed Trading and Cybersecurity Breaches
Abstract: Cybersecurity has become a significant concern in corporate and commercial settings, and for good reason: a threatened or realized cybersecurity breach can materially affect firm value for capital investors. This paper explores whether market arbitrageurs appear systematically to exploit advance knowledge of such vulnerabilities. We make use of a novel data set tracking cybersecurity breach announcements among public companies to study trading patterns in the derivatives market preceding the announcement of a breach. Using a matched sample of unaffected control firms, we find significant trading abnormalities for hacked targets, measured in terms of both open interest and volume. Our results are robust to several alternative matching techniques, as well as to both cross-sectional and longitudinal identification strategies. All told, our findings appear strongly consistent with the proposition that arbitrageurs can and do obtain early notice of impending breach disclosures, and that they are able to profit from such information.
David Blei
Professor of Statistics and Computer Science, Faculty of Arts and Sciences and Columbia Engineering
Talk Title: Shopper: Probabilistic Machine Learning for Consumer Choice
Abstract: I describe Shopper, a sequential probabilistic model of market baskets. Shopper uses interpretable components to model the forces that drive how a customer chooses products; it is designed to capture how items interact with other items. I describe an efficient inference algorithm to estimate these forces from large-scale data, and report a study of over five million transactions from a major chain grocery store. We are interested in answering counterfactual queries about changes in prices. We found that Shopper provides accurate predictions even under price interventions, and that it helps identify complementary and substitutable pairs of products.
This is joint work with Fran Ruiz (Columbia) and Susan Athey (Stanford).
Adler Perotte
Assistant Professor in the Department of Biomedical Informatics, Vagelos College of Physicians and Surgeons
(Moderator)
DSI Industry Affiliates have access to Data Science Day recordings after the event. If you are a current DSI Industry Affiliate please contact us at datascience@columbia.edu for a link to the videos.