Columbia University hosted the first Data Science Week, which brought together researchers, industry experts, policymakers and other professionals working at the forefront of data science. Over the course of three daylong conferences (March 26-28), they explored how to harness the data science revolution in ways that enhance knowledge, benefit society and improve the quality of life.

“These three back-to-back conferences showed that Columbia is an international leader in data science,” said Jeannette M. Wing, Avanessians Director of the Data Science Institute at Columbia. “I’m delighted that prominent data science professionals from industry, academia, government as well as the nonprofit and foundation sectors came to campus to discuss how to advance the state-of-the art in data science and use data for good.”

Here are details on the three conferences that together constituted Data Science Week:

Data Science Leadership Summit

Data science leaders from colleges and universities across the U.S. gathered at Columbia for the Data Leadership Summit, the first meeting to bring together the academics who direct the nation’s data science centers and institutes. Some 65 heads of data science from more than 30 colleges and universities attended the March 26 Summit.

The Summit was organized by Wing, a former Corporate Vice President of Microsoft Research who is one of the nation’s foremost technology leaders. Since being named director of DSI in July, her goal has been to bring the directors of data science centers together.

“I’m excited to begin building an academic community in data science,” Wing said. “The overriding objective of the Summit was to explore what we can do together to advance the field of data science. It was a day where we shared best practices, discussed where we face similar challenges and opportunities, and talked about preparing the next-generation of data scientists as well as how to create a community around data science.”

As data science emerges as a discipline, data science initiatives at colleges across the U.S. and beyond have been sprouting up rapidly, added Wing. Whereas a few years ago there were a handful of data science institutes and centers, there are now more than 30 in top public and private universities.

Wing kicked off the Summit with a welcoming address titled “Data Science in Academia.” The morning’s speakers and discussions centered on data-driven research as well as the foundations and applications of data science. Participants discussed how to advance the state-of-the art in data science and how to use it to advance research in other fields.

The challenge for a university is how to support data science, an inherently interdisciplinary field, especially in its breadth of applications, when academia is often structured to excel within disciplinary boundaries, said Wing.

“Universities today answer this question differently – some create multi-disciplinary research institutes, some create new academic units while others add data science to existing programs or departments,” added Wing. “We are all in the early stages of figuring it out for our respective institutions. There’s no one right answer since each university has its own culture, traditions, and organizational structure. ”

The afternoon featured plenary talks, breakout sessions and roundtable discussions on topics such as ethics in data science, developing an academic cloud, best practices in undergraduate and graduate education and how to engage with industry, government and foundations to build a data science community. Kathy McKeown, the founding director of the Data Science Institute and a professor of Computer Science at Columbia Engineering, provided an overview on the National Academies Roundtable on Data Science.

Participants at the Summit worked in unison to answer these questions:

Research: How can we support inherently interdisciplinary research, if not in our foundations, then in our applications? How is faculty hiring (e.g., joint appointments) done? What kind of institutional support is needed? How are data science demands across campus being met (e.g., through applied data scientists, post-doc fellows)?

Education: What should every undergraduate know? What should every undergraduate data science major know? What should every master’s student know to be prepared for industry, not just the technology industry? What makes sense for courses, dissertation topics, and in terms of advising students? How do we advance the field of the data science and support its broad applicability at the PhD level

Ethics: How are people teaching students and encouraging research on ethics and privacy? What should we advocate as a data science community?

Engagement: What kinds of engagement do data science units have with industry, local government, foundations, other universities, and K-12? What is working and what does not work?

Compute Infrastructure: What kind of computing support is there for hosting data sets and providing data center-scale compute, including GPUs, etc., or do people use the cloud? How do we sustain open-source efforts that are germane to data science? How do we ensure secure access and sharing of data sets?

People: What kinds of personnel are data science units hiring to help bridge the demand for data science expertise across campus and in contending with the limited supply? Are universities creating a new kind of faculty or technical staff to hire these people? Are there new models that universities need to create to allow for a freer flow between academia and industry?

Sustaining a community: One intended outcome of this summit is to create an academic community around data science. How can attendees sustain such a community for the long-term?

The Summit was supported by the National Science Foundation, the Alfred P. Sloan Foundation and the Gordon and Betty Moore Foundation. Attending the Summit were leaders of the four NSF Big Data Hubs and the principal investigators for the NSF Transdisciplinary Research in Principles of Data Science (Tripod) awards. Columbia is a lead institution for both the Hubs and the Tripod research.

“The Data Science Institute at Columbia, with 300 affiliated researchers from every school and department at the university, is transforming all fields, sectors and professions through the application data science,” said Wing. “Columbia is a world leader in the field, and by bringing together data science leaders for this Summit, I know we will push the field a giant step forward.”

Annual Summit of the Northeast Big Data Innovation Hub

On March 27, The Northeast Big Data Innovation Hub, hosted at Columbia, held its annual summit that convened the data-science community of the Northeast United States.

The day featured updates on cross-sector initiatives, lightning talks from Big Data Spoke researchers and breakout sessions on data literacy, ethics, and health. The Hub, supported by the National Science Foundation, is a regional network of academic, industry and government partners who work in tandem to spur data-driven innovations and use data analytics to address society’s most pressing problems.

​“The Hub brings together data science leaders from across all sectors – academia, industry, government, and non-profit – to build a community greater than the sum of ​its parts,” said René Bastón, Executive Director of the Northeast Big Data Innovation Hub. “Our event showcased the breadth of perspectives among our data science ​stakeholders, and​ presented a great​ opportunity for ​them to collaborate on addressing ​large challenges with data-driven innovations.”

The Summit’s keynote speaker, Corinna Cortes, Head of Google Research, New York, discussed her team’s data-driven approach to fighting fake news. At Google, Cortes is working on a broad range of theoretical and applied large-scale machine learning problems. Additionally a panel of leaders from academia and industry discussed the challenges of rapidly advancing digital media, both in terms of maximizing its benefits and minimizing its potential drawbacks.

Following lunch, leaders of the Hub’s Big Data Spokes – multi-institutional, multi-sector collaborations that focus on topics of specific interest to the Hub community – highlighted the current work being done in their fields. Jane Greenberg, Professor at Drexel University and Director of the Metadata Research Center, talked about data sharing between sectors and the legal and privacy concerns that make sharing agreements difficult. Jaclyn Ocumpaugh, Associate Director, Penn Center for Learning Analytics, discussed the transformative effect that big data can have on education.And Chirag Lakhani, Research Fellow at Harvard Medical School’s Department of Bioinformatics, described a search engine (ExposomeDW) that finds environmental and phenotypic factors associated with disease and health.

There were breakout sessions in which stakeholders worked in groups to advance projects in data literacy, ethics, and health; a Data Literacy group worked to produce a draft framework of principles that define data-literacy concepts; and a Health group featured a talk from Vasant Honavar, Professor and Chair of Information Sciences and Technology at Penn State. There were also demonstration of the ExposomeDW given by Chirag Lakhani and Shreyas Bhave, an Undergraduate Research Assistant at Johns Hopkins. An Ethics panel featured lightning talks from a host of data science experts, which was followed by a group discussion and working session aimed at developing a proposal for a project on data ethics.


Breakthrough Research Featured at Data Science Day

In the final event of the week, Columbia held its third annual Data Science Day, a March 28 conference that showcased the research of university professors who use the most advanced techniques in data science to transform all fields, professions and sectors and solve some of society’s most vexing problems. The professors are affiliated with the Data Science Institute (DSI) at Columbia, a world leader in data science research, education and outreach. The day aimed to foster collaboration between data-driven innovators in academia, industry and government. Researchers also illustrated how they use data science to deepen their understanding of topics such as public health and medical research, climate change and financial risk.

“As a renowned university with top schools and departments, Columbia is the perfect laboratory in which to explore the transformation of all fields through the application of data science,” said Wing. “And for that transformation to spread widely, data scientists and domain experts must work with industry partners and policy makers to harness the data science revolution in ways that best serve society.  That was our purpose and hope in hosting Data Science Day.”

In quick summaries of their research, known as lightning talks, professors from across Columbia presented how they use data science to address problems like the onslaught of fake news and cyber attacks that threaten our privacy, our financial security and even our democracy.  The professors call upon the most advanced techniques in data science – machine learning, neural networks, topic modeling, Bayesian statistics and deep learning – to conduct their research. The focal point for this breakthrough research is the Data Science Institute, whose 300-affiliated faculty members work in all fields, departments and schools throughout Columbia University.

During a fireside chat at the conference, Wing discussed Data and Democracy with Columbia President Lee Bollinger. Along with being distinguished leaders, Wing and Bollinger are also prominent thinkers in their respective fields. Wing led Carnegie Mellon’s Computer Science Department and also oversaw the National Science Foundation’s computer and information science and engineering directorate. Bollinger is Columbia’s first Seth Low Professor, a member of the Columbia Law School faculty, and one of the nation’s foremost First Amendment scholars. In their chat, they discussed the implications of digital data on law, policy and democracy. They also discussed how platform companies such as Facebook, Twitter, and Google are heightening concerns about fake news and first ammendment rights. The two explored questions such as: Are there new threats to our democracy that we need to worry about? Or are we simply witnessing a transformation of what democracy means?

The day’s keynote address was given by Diane Greene, CEO of Google Cloud. She talked about the convergence of big data, artificial intelligence, and the cloud. Greene is one of the most prominent executives in enterprise technology. Before joining Google, she co-founded and sold three successful technology companies. She is on the board of MIT as a lifetime member of the MIT Corporation and remains on the board of Alphabet, the parent company of Google.

Here’s a list of the lightning talks given by Columbia researchers:

Health Discovery From and For Data Science


Nicholas P. Tatonetti, Herbert Irving Assistant Professor of Biomedical Informatics, and Director, Climate and Health Program; Using Electronic Health Records to Study the Heritability of Traits and Disease

Jeffrey Shaman, Associate Professor of Environmental Health Sciences; Developing Real-time Nowcasting and Forecasting of Seasonal Influenza

Jacqueline Gottlieb, Professor of Neuroscience: Knowing What is Important: How Humans Decide To What To Attend

Climate + Finance: Use of Environmental Data to Measure and Anticipate Financial Risk


Geoffrey Heal, Donald C. Waite III Professor of Social Enterprise in the Faculty of Business; Professor of International & Public Affairs: Evaluating the Risks of Sea-level Rise on Property Values and Coastal Infrastructure

Lisa Goddard, Director of International Research Institute for Climate and Society: Using Data to Help African Farmers and to Forecast Financing for Humanitarian Aid

Wolfram Schlenker, Professor, School of International and Public Affairs

Agricultural Yields and Prices in a Warming World

Machine Learning: The Good, The Bad and The Law


David Blei, Professor of Statistics & Computer Science:  Designing a Model to Show How Customers Choose Products

Junfeng Yang, Associate Professor of Computer Science: Effective Testing and Verification of Deep Learning Systems

Joshua Mitts, Associate Professor of Law: The Effect of Cybersecurity Breaches on Financial Markets

Professors and their students also demonstrated their research with poster presentations. The day ended with a reception for industry, government, faculty and students.

The Data Science Institute, founded in 2012, is an international leader in data science research, education and outreach. Its mission is to advance the state-of-the-art in data science; to transform all fields through the application of data science; and to ensure the responsible and ethical use of data to benefit society. DSI is also training the next generation of data scientists and developing innovative technology. With 300 affiliated faculty members working in all fields and schools across Columbia, the Institute fosters collaborations that advance the field of data science while addressing the urgent problems facing our society.

— Robert Florida