Ethical Principles, OKRs, and KPIs: What YouTube and Facebook could learn from Tukey
April 2, 2018
The ugly truth is that we believe in connecting people so deeply that anything that allows us to connect more people more often is de facto good. It is perhaps the only area where the metrics do tell the true story as far as we are concerned.
– Facebook VP Andrew Bosworth, 18 June 2016, as leaked to Buzzfeed
“Watch time was the priority…Everything else was considered a distraction.”
– (ex)-Google engineer Guillaume Chaslot, as quoted in the Guardian 2 Feb 2018, describing YouTube’s recommendation engine’s sole KPI
“Software is eating the world”, the venture capitalist Marc Andreessen warned us in 2011, and, more and more, the software eating our world is also shaping our professional, political, and personal realities via machine learning. These include, for example, the recommendation algorithms selecting what items appear in our social feeds, or selecting the next autoplay video on YouTube, or recommending ‘related’ products for purchase on Amazon.
In a paper1 we’re reading in this week’s “Data: Past, Present, and Future” course2, machine learning is defined as “computer systems that automatically improve through experience”. In the context of information platforms3 such as Facebook, this means that a technologist has chosen how to quantify “improve”, and the “experience” is the logged interaction data from the millions of other users who have interacted with a product.
But what, then, are we to optimize?
At the scale of a corporation, enterprise goals like revenue or profit set overall company strategy; the art of product managers in software companies is to relate these company goals to product goals such as monthly active users (MAUs) or user-level key performance indicators (KPIs) such as time spent in the application, or the probability a user clicks, swipes, or otherwise engages with content served by a recommendation algorithm.
Absent from the engineering and product challenges of KPI optimization is the challenge of relating company goals, product goals, and KPIs to ethical principles – a complexity which has dominated the last few months of popular news about information platforms4. Topics such as transparency, privacy, informed consent, harm to users, and security – including information security – are not easily related to ‘connecting more people’ and ‘watch time’, to use the two product KPIs mentioned above.
“When a measure becomes a target, it ceases to be a good measure” – Goodhart’s law
Over the past decades it has become far easier to optimize product metrics such as engagement in information platforms that draw content from across millions of active users. Algorithmically-driven social media companies optimize for engagement in ways motivated by company goals. The awesome engineering infrastructure and high performance machine learning models can distract from the important question faced by any metric-minded product team: what should we be optimizing?
In the quote from Bosworth, above, he makes clear that the metric of connecting people is the end of itself. It is not a proxy for, e.g., stock price or financial metrics; nor is it a proxy for a mission-driven goal or an ethical principle. It is the KPI to be optimized, and product decisions, marketing decisions, and engineering decisions follow from that goal.
“The question before us is the ethics of leading people down hateful rabbit holes full of misinformation and lies at scale just because it works to increase the time people spend on the site – and it does work” – Zeynep Tufekci, as quoted in the Guardian 2 Feb 2018
Of concern to users of these products is that such optimizations have unintended consequences which might act against users’ interests – short or long-term, individual or collective — not easily captured by product KPIs. It can be difficult for individual product mangers and engineers willingly to slow down optimizing a metric and research the extent to which a company-sanctioned KPI does or does not advance principles or larger goals for which the KPI should be a proxy. Moreover, given the complexities of human-computer interactions, networks of people, and dynamics of information, a KPI which may initially advance these principles may soon come to thwart them.
The tension between advancing engineering and science while respecting benefits to individuals and to society has been at the heart of decades of careful thinking about how best to define ethical principles, and at what granularity or generality. Both in the research community and in information security, a few ethical principles set out to guide how we think about impact of research and technology. As codified by the Belmont5 and Menlo6 reports, these may be summarized briefly as:
To these, security minded technologists may wish to add
both information security and cognitive security7. This is particularly challenging for an ‘open’ platform, since, with growing complexity, an ‘open’ platform becomes a platform open to exploitation.
How, then, can engineers monitor and maintain awareness of what other KPIs, or what ethical or design principles such as the five above, are challenged by KPIs such as ‘watch time’ or ‘number of connected users’?
Product problems are not just engineering problems. The engineering might be the place at which technologists feel most comfortable, by virtue of identity or professional certification, but with the capability afforded by machine learning comes necessity to think through the long-term impact of our technological choices. A technology allows us to separate capability from intent; since we can not think through every possible intent driving a technology’s use, we must design to monitor and to mitigate uses that challenge our principles.
Part of this monitoring will not be quantitative. Particularly since we can not know in advance every phenomenon users will experience, we can not know in advance what metrics will quantify these phenomena. To that end, data scientists and machine learning engineers must partner with or learn the skills of user experience research, giving users a voice. This can mean qualitative surveys, interviews, or other means of gathering data from users. Among engineers, using the product in order to provide feedback is called ‘dogfooding’; however information platforms must be designed knowing that not all users will have the same interests and values as the technology’s creators.
More generally: make sure the users have a voice, and think, with empathy for the users’ interests, what is their experience? It may vary greatly from what you hypothesized or designed.
Monitoring and mitigating experiences which contradict our design principles, including ethical principles, may require updating or iterative re-evaluation of the KPIs that drive digital products. It may be, for example, that connecting as many users as possible is only one proxy KPI in service of a less granular team goal (sometimes termed an ‘Objective and Key Result’ or OKR 8) or an even less granular principle. Ideally, product and engineering KPIs and OKRs derive from these corporate and community principles just as laws and standards follow from legal principles9. In this sense, all KPIs are ‘wrong’ as they are the easy-to-operationalize proxies for the less granular standards (or OKRs) or principles (or company goals (or ideally both)) from which they derive.
Even within a machine learning approach, we can search for ways an initial KPI can be optimized while constraining a separate KPI motivated by user research, e.g., exposure to a diversity of information or opinion, depending on what principles an information platform hopes to advance. Rethinking or even changing a product KPI in the face of changing user experience can be disruptive and frustrating to engineers and product owners; however, as John Tukey warned us10:
“Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.”
Your KPIs can’t conflict with your principles if you don’t have principles.
That is: