The ugly truth is that we believe in connecting people so deeply that anything that allows us to connect more people more often is de facto good. It is perhaps the only area where the metrics do tell the true story as far as we are concerned.

– Facebook VP Andrew Bosworth, 18 June 2016, as leaked to Buzzfeed

“Watch time was the priority…Everything else was considered a distraction.”

– (ex)-Google engineer Guillaume Chaslot, as quoted in the Guardian 2 Feb 2018, describing YouTube’s recommendation engine’s sole KPI

“Software is eating the world”, the venture capitalist Marc Andreessen warned us in 2011, and, more and more, the software eating our world is also shaping our professional, political, and personal realities via machine learning. These include, for example, the recommendation algorithms selecting what items appear in our social feeds, or selecting the next autoplay video on YouTube, or recommending ‘related’ products for purchase on Amazon.

In a paper1 we’re reading in this week’s “Data: Past, Present, and Future” course2, machine learning is defined as “computer systems that automatically improve through experience”. In the context of information platforms3 such as Facebook, this means that a technologist has chosen how to quantify “improve”, and the “experience” is the logged interaction data from the millions of other users who have interacted with a product.

But what, then, are we to optimize?

At the scale of a corporation, enterprise goals like revenue or profit set overall company strategy; the art of product managers in software companies is to relate these company goals to product goals such as monthly active users (MAUs) or user-level key performance indicators (KPIs) such as time spent in the application, or the probability a user clicks, swipes, or otherwise engages with content served by a recommendation algorithm.

Absent from the engineering and product challenges of KPI optimization is the challenge of relating company goals, product goals, and KPIs to ethical principles – a complexity which has dominated the last few months of popular news about information platforms4. Topics such as transparency, privacy, informed consent, harm to users, and security – including information security – are not easily related to ‘connecting more people’ and ‘watch time’, to use the two product KPIs mentioned above.

Machine learning and product mindset

“When a measure becomes a target, it ceases to be a good measure” – Goodhart’s law

Over the past decades it has become far easier to optimize product metrics such as engagement in information platforms that draw content from across millions of active users. Algorithmically-driven social media companies optimize for engagement in ways motivated by company goals. The awesome engineering infrastructure and high performance machine learning models can distract from the important question faced by any metric-minded product team: what should we be optimizing?

In the quote from Bosworth, above, he makes clear that the metric of connecting people is the end of itself. It is not a proxy for, e.g., stock price or financial metrics; nor is it a proxy for a mission-driven goal or an ethical principle. It is the KPI to be optimized, and product decisions, marketing decisions, and engineering decisions follow from that goal.

Ethical mindset

“The question before us is the ethics of leading people down hateful rabbit holes full of misinformation and lies at scale just because it works to increase the time people spend on the site – and it does work” – Zeynep Tufekci, as quoted in the Guardian 2 Feb 2018

Of concern to users of these products is that such optimizations have unintended consequences which might act against users’ interests – short or long-term, individual or collective — not easily captured by product KPIs. It can be difficult for individual product mangers and engineers willingly to slow down optimizing a metric and research the extent to which a company-sanctioned KPI does or does not advance principles or larger goals for which the KPI should be a proxy. Moreover, given the complexities of human-computer interactions, networks of people, and dynamics of information, a KPI which may initially advance these principles may soon come to thwart them.

The tension between advancing engineering and science while respecting benefits to individuals and to society has been at the heart of decades of careful thinking about how best to define ethical principles, and at what granularity or generality. Both in the research community and in information security, a few ethical principles set out to guide how we think about impact of research and technology. As codified by the Belmont5 and Menlo6 reports, these may be summarized briefly as:

  1. Respect for Persons:
    • informed consent;
    • respect for individuals’ autonomy;
    • respect individuals impacted;
    • protection for individuals with diminished autonomy or decision making capability.
  2. Beneficence:
    • Do not harm;
    • assess risk.
  3. Justice:
    • equal consideration;
    • fair distribution of benefits of research;
    • fair selection of subjects;
    • equitable allocation of burdens.
  4. Respect for Law and Public Interest:
    • legal due diligence;
    • transparency in methods and results;
    • accountability.

To these, security minded technologists may wish to add

  • Security

both information security and cognitive security7. This is particularly challenging for an ‘open’ platform, since, with growing complexity, an ‘open’ platform becomes a platform open to exploitation.

How, then, can engineers monitor and maintain awareness of what other KPIs, or what ethical or design principles such as the five above, are challenged by KPIs such as ‘watch time’ or ‘number of connected users’?

User experience research

Product problems are not just engineering problems. The engineering might be the place at which technologists feel most comfortable, by virtue of identity or professional certification, but with the capability afforded by machine learning comes necessity to think through the long-term impact of our technological choices. A technology allows us to separate capability from intent; since we can not think through every possible intent driving a technology’s use, we must design to monitor and to mitigate uses that challenge our principles.

Part of this monitoring will not be quantitative. Particularly since we can not know in advance every phenomenon users will experience, we can not know in advance what metrics will quantify these phenomena. To that end, data scientists and machine learning engineers must partner with or learn the skills of user experience research, giving users a voice. This can mean qualitative surveys, interviews, or other means of gathering data from users. Among engineers, using the product in order to provide feedback is called ‘dogfooding’; however information platforms must be designed knowing that not all users will have the same interests and values as the technology’s creators.

More generally: make sure the users have a voice, and think, with empathy for the users’ interests, what is their experience? It may vary greatly from what you hypothesized or designed.

All KPIs are wrong, but some are useful

Monitoring and mitigating experiences which contradict our design principles, including ethical principles, may require updating or iterative re-evaluation of the KPIs that drive digital products. It may be, for example, that connecting as many users as possible is only one proxy KPI in service of a less granular team goal (sometimes termed an ‘Objective and Key Result’ or OKR 8) or an even less granular principle. Ideally, product and engineering KPIs and OKRs derive from these corporate and community principles just as laws and standards follow from legal principles9. In this sense, all KPIs are ‘wrong’ as they are the easy-to-operationalize proxies for the less granular standards (or OKRs) or principles (or company goals (or ideally both)) from which they derive.

Even within a machine learning approach, we can search for ways an initial KPI can be optimized while constraining a separate KPI motivated by user research, e.g., exposure to a diversity of information or opinion, depending on what principles an information platform hopes to advance. Rethinking or even changing a product KPI in the face of changing user experience can be disruptive and frustrating to engineers and product owners; however, as John Tukey warned us10:

“Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.”

What is to be done?

Your KPIs can’t conflict with your principles if you don’t have principles.

That is:

  1. start by defining your principles. I’d suggest the 5 above, which are informed by the collective research of the authors of the Belmont and Menlo reports on ethics in research, augmented by a concern for safety of the users of a product. The choice is important, as is the choice to define, in advance, the principles which guide your company, from the high level corporate goals to the individual product KPIs.
  2. Next: before optimizing a KPI, consider how this KPI would or would not align with your principles. Now document that and communicate, at least internally if not externally to users or simply online.
  3. Next: monitor user experience, both quantitatively and qualitatively. Consider what unexpected user experiences you observe, and how, irrespective of whether your KPIs are improving, your principles are challenged.
  4. Repeat: these conflicts are opportunities to learn and grow as a company: how do we re-think our KPIs to align with our OKRs, which should derive from our principles? If you find yourself saying that one of your metrics is the “de facto” goal, you’re doing it wrong.

    Chris Wiggins is senior columnist of Voices at Columbia University Data Science Institute, an associate professor of applied mathematics at Columbia University and the Chief Data Scientist at The New York Times.

  1. Jordan, Michael I., and Tom M. Mitchell. “Machine learning: Trends, perspectives, and prospects.” Science 349, no. 6245 (2015): 255-260.
  2. data-ppf.github.io
  3. Hammerbacher, Jeff. “Information platforms and the rise of the data scientist.” Beautiful Data (2009): 73-84.
  4. e.g., http://www.nytimes.com/2018/03/24/technology/google-facebook-data-privacy.html
  5. Department of Health, Education. “The Belmont Report. Ethical principles and guidelines for the protection of human subjects of research.” The Journal of the American College of Dentists 81, no. 3 (2014): 4.
  6. Dittrich, David, and Erin Kenneally. “The Menlo Report: Ethical principles guiding information and communication technology research.” US Department of Homeland Security (2012).
  7. Waltzman, Rand. The Weaponization of Information: The Need for Cognitive Security. RAND, 2017.
  8. Popularlized by Google, using “objectives and key results” to relate strategy to tactics was introduced by the celebrated enginer-turned-CEO Andy Grove; cf., e.g., Grove, Andrew S. High output management. Vintage, 2015. For example, an OKR might be “distill company ethical strategy to 5 principles, and publish these online before Q3.”
  9. Solum, L. B. “Legal theory lexicon: Rules, standards, and principles.”, http://lsolum.typepad.com/legaltheory/2009/09/legal-theory-lexicon-rules-standards-and-principles.html
  10. Tukey, John W. “The future of data analysis.” The annals of mathematical statistics 33, no. 1 (1962): 1-67.