Data Science Day 2020 Draws an International, Virtual Crowd

September 14, 2020

Eric Schmidt, former Google chief executive officer and executive chairman and co-founder of Schmidt Futures, delivers the Data Science Day 2020 keynote address, September 14, 2020.

More than 1,800 people from around the world registered for Data Science Day 2020, a virtual event during which Columbia professors and former Google CEO Eric Schmidt discussed ethics and privacy in data science.

*DSI’s Avanessians Director Jeannette M. Wing welcomes participants to Data Science Day 2020, September 14, 2020.*

In her welcoming address, Jeannette M. Wing, Avanessians Director of the Data Science Institute and professor of computer science, characterized data science as an emerging field that has profound societal consequences and “requires us to face [the issues of ethics and privacy] head on from the very beginning.” Data science relies on data, but data is about people, “about us, so how can we build models and systems while preserving the privacy of our data?” Wing postulated.

That was the central question that four Columbia professors addressed in short virtual presentations known as lightning talks. DSI member Tamar Mitts, assistant professor of international and public affairs at the School of International and Public Affairs, moderated the talks.

*Tamar Mitts, left, moderated lightning talks by, clockwise from top, Rafael Yuste, Yeon-Koo Che, Jeff Goldsmith, and Roxana Geambasu, September 14, 2020.*

The NeuroRights Initiative: Human Rights Guidelines for Neurotechnology and AI in a Post-Covid World

Rafael Yuste, professor of biological sciences, discussed the NeuroRights Initiative, a mix of ethical codes and human rights directives he developed in an effort to protect people from potentially harmful neurotechnologies by ensuring the responsible development of brain-computer interfaces and similar neurotechnologies. To safeguard the development of all such devices, Yuste crafted five neurorights that can guide policymakers, technologists, and scientists who are regulating emerging neurotechnologies. Yuste has also written a technocratic oath, inspired by the Hippocratic oath, that he hopes all engineers, scientists and militarists working on neurotechnologies will sign and abide by.

“The goal of the NeuroRights Initiative is to preempt the creation of harmful neurotechnologies and AI algorithms by providing ethical frameworks for entrepreneurs, physicians, and researchers developing nanotechnology and AI,” Yuste said.

The Effect of Privacy Regulation on the Data Industry: Empirical Evidence From GDPR

Yeon-Koo Che, professor of economic theory in the Department of Economics, discussed his recent study on data privacy, which revealed an ironic twist: The European Union’s General Data Protection Regulation, intended to give people more control of their personal private data, had the effect of making it easier for advertisers to track certain people. In the two years since the General Data Protection Regulation was instituted, 10.7 percent of users opted out of sharing their data. But the trackability of remaining users increased by about 8 percent, according to Che’s study. He and his collaborators studied an anonymous third-party advertising company that recorded keyword searches and purchases for online travel agencies.

“We studied the impact of GDPR to highlight how government-mandated privacy protections interact with other privacy means,” Che said. “Do consumers benefit? That depends on how firms used improved prediction. Do firms suffer? They lose consumers from opt-out, but remaining consumers are of higher value to them.”

Data Science Ethics: A View From Public Health

How can data scientists, who work mostly with numbers, learn from public-health practitioners, who conduct human-subject research and are forced to confront the human element of data collection? In his lightning talk, Jeff Goldsmith, associate professor in biostatistics at Columbia’s Mailman School of Public Health, had a few recommendations for data scientists, such as to always consider consent, representation and the unintended consequences of their work.

He suggested that data scientists ask hard questions about data, such as who is included, is the measurement and sampling process valid, and what is the causal mechanism assuming or implying. Always involve stakeholders and plan for dialogue, Goldsmith said, and “understand who is likely to be impacted, and how, and engage at each stage—conceptualization, implementation, dissemination, which implies the need for transparency and openness.”

Differential Privacy: Basics and Latest Research

Differential privacy is a statistical approach used to protect the personal data of large groups. Here, individual records are intentionally left incomplete, while statistical properties are shared between data processing partners. Though it might seem like a sound way to safeguard data, people’s personal-purchase data can still be reconstructed with the release of accurate statistics from datasets, said Roxana Geambasu, an associate professor of computer Science who studies how to safeguard data used in machine learning.

She detailed how she and fellow researchers are developing ways to enhance differential privacy, which will have broad applications in helping safeguard personal data, she said. “We seek to address programming, testing and production challenges, provide tools for privacy budget management and incorporate different privacy into real infrastructure systems.”

Keynote Address: Eric Schmidt

After the four lightning talks, Eric Schmidt, executive chairman and co-founder of Schmidt Futures, discussed the intersection of human rights, data privacy, and AI technologies and what he has learned over the course of his career. Schmidt admitted that he has been wrong in some of his predictions about technology and economics, but said his errors were borne of an optimism about the use of technology for good.

The past few years, however, have shown the negative aspects of technology, and how to address that isn’t unambiguous or easy, he said. How, for example, do you regulate social media and fake news while safeguarding free speech? Or how do you regulate big tech companies without hobbling their creativity and growth? And how can the U.S. compete with China in the race for AI technologies, when the Chinese government handpicks AI companies to support and protect? These were some of the questions he discussed, first in his address and then in a fireside chat with Wing. He also took questions from the virtual audience by way of chat.

“I spent 40 plus years believing that technology was a strong force of good,” he said, “and I must say I get angry when the reality of the world collides with my relatively naive and simplistic view that technology should just make people better… But universities like Columbia are trying very hard to develop a broader and deeper understanding of where the technology really affects people.”

Register for Data Science Day 2021

— Robert Florida