Ethics, Data, and Product

December 7, 2017

There’s a lot to unpack in the headline “Facebook’s suicide alert tool isn’t coming to the EU.” It immediately reminded me of two topics we discussed at the beginning and the end of the course “Data: Past, Present, and Future” last spring, a class we’ll be teaching again next semester.

We opened the class with cautionary tales of how data-powered algorithms shape our world and reality, for example Facebook’s contagion experiment, published in 2014, and the fallout concerning ethics. We ended the class talking about the forces that shape and constrain innovation, including the ways that government, for example via the European Union’s General Data Protection Regulation, impact the role of AI in so many people’s personal lives today.

Matthew Jones and Chris Wiggins co-teach a data-analytics course for humanities students. / Photograph by John Pinderhughes

With no prerequisites, the class drew humanists and scientists alike. One of many benefits of co-developing the class with Professor Matt Jones of Columbia’s History department is his expert understanding of how we got here, including how we came to think of ethics around research involving human beings. As we discussed in class, the way academic researchers think of ethics has been heavily shaped by the development of Institutional Review Boards, which institutionalize the values articulated in the Belmont Report of 1974. The Report emerged in response to egregious ethical lapses in the name of science or engineering, particularly in the 20th century.

The Belmont Report set out high level and abstract principles: Respect for persons; Beneficence; and Justice. To put these principles into practice as working standards and rules requires subjective design choices.The dangers of data-intensive research and commerce, and the need to rethink the political and ethical framework for pursuing them, has long been well understood in the legal theory community ( e.g., http://lsolum.typepad.com/legaltheory/2009/09/legal-theory-lexicon-rules-standards-and-principles.html ), as well as in the quantitative social sciences (see, e.g., http://bitbybitbook.com for history and references), or science and technology studies (see Solon Barocas’ great syllabus for a survey).

Academic research operationalizes the Belmont principles via the standards and rules of Institutional Review Boards. How might corporations operationalize these values? Ideally, the design choices made in shaping data-empowered algorithms would take these standards and higher-level principles into account, and not simply settle for maximizing the key performance indicators technologists are employed to optimize, regardless of the effects on society as whole and on the dignity of each individual user, customer, and citizen.

Corporate actors don’t follow these rules, and generally they don’t have to. The gap between the principles shaping our understanding of ethical research and the subjective design choices of software engineers in industry is wider than one of philosophy versus engineering: our operationalization of these principles via Institutional Review Boards in academia holds no teeth when it comes to private enterprise. Facebook, Google, and any other company for that matter is unconstrained by ethics as operationalized as rules largely developed for academics and highly regulated industries such as pharmaceuticals.

What then does and what could constrain technology innovation?

One answer we discussed in class was the vision of a “three player game,” as framed by the venture capitalist Janeway, among government, the market, and financial capitalism. Government sets regulation; the market pays for goods and services; and investors—typically venture capitalists—take bets on the future profitability of companies. These bets provide the fuel that allows unsustainable innovation (e.g., the rapid growth of companies with no income, e.g., Google and Facebook before they began monetizing via digital advertising). They also allow companies to scale even as they lose money on every transaction, e.g., low-cost ride sharing apps or the low-cost delivery services like Kozmo.com.

This gets us back to Facebook’s suicide app and European regulation of personal data. The government in this case protects individuals who provide Facebook a valuable and scarce asset: not money, as in Janeway’s three player game, but their attention and data.

Data-empowered technology companies, then, add two other “players” to the three player game in Janeway’s description of general technology innovation: the users who supply their valuable data, and the data scientists who develop and deploy machine-learning algorithms to turn these data into revenue. Like cash, users and people are scarce—companies need them to thrive and they are not infinite.

These forces constrain and guide private companies. We may wish the ethical principles of Belmont to hold, but these principles cannot be operationalized into standards and rules the way they are in academia (e.g., via IRB). Instead, the forces include government regulation, as EU’s GDPR illustrates, and scarcity of finite resources: venture capital investment, consumer revenue (e.g., the marketers who advertise on the platforms), the users who provide the data, and the data scientists who monetize these data.

Data-empowered technology companies are in some ways more open to external forces, should the users decide, for example, that privacy or information security are more important than convenience and entertainment. Similarly, should technologists decide to move from advertising platforms to other sectors, in ways similar to the flow of developers from financial services after the economic collapse of 2008, this also limits a finite resource which these companies, at least for the foreseeable future, require.

By the time we teach the class again next month, perhaps an entirely different drama among the five players (investors, government, consumers, users, and data scientists) will be at play, but likely the rules of the game will still govern. Individual people constitute several direct roles in the game – some as the users, some as the data scientists and software engineers, and less directly as voters who influence the government. It’s up to us collectively to make ethics guide and constrain corporate research and marketing.

For more on “Data: Past, Present, and Future”, please see https://data-ppf.github.io/ and, even better, attend the first class on January 16th at 10:10am, room to be announced.

Thanks to Professor Matt Jones for comments on an earlier version of this document.

Chris Wiggins is senior columnist of Voices at Columbia University Data Science Institute, an associate professor of applied mathematics at Columbia University and the Chief Data Scientist at The New York Times.