DSI Research Scientist Savannah Thais Shares the Importance of Community-driven Data Research

November 17, 2022

Savannah Thais, a recently appointed Research Scientist at Columbia University’s Data Science Institute, is applying her background in high-energy particle physics toward modeling complex machine learning systems. Her focus is on the social impacts of model development, specifically how to minimize algorithmic biases through building research relationships between scientific disciplines.

“My work lies at the intersection between hard science (math, coding, model building) and social impact and policy,” Thais explains. “I want to bring ideas from how we think about model building, data representation, and evaluation in physics to other types of systems. In physics, we have an established history with careful and meticulous methods. I want to translate that rigor to other aspects of machine learning, as well as social systems and the public sector.”

Thais earned her PhD in Physics from Yale and then worked as an Associate Research Scholar at the Princeton Institute for Computational Science and Engineering (PICSciE) within the Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP). As part of her PhD research, Thais engaged with the ATLAS Experiment at the Large Hadron Collider (LHC) at CERN in Switzerland, using neural networks to look for rare physics processes. It was during that time that Thais first encountered machine learning, and discovered a fascination with the process and implications of model building.

Through her research interests, Thais developed a deeper understanding of the social impact of algorithms, particularly in issues around embedded biases. Her research emphasizes the importance of evaluating social impact from both a mathematical and community perspective.

“I’m interested not only in machine learning, but in activism, public policy, and public service,” Thais explains. “Machine learning processes are becoming embedded in more and more critical societal systems, and are used to make a range of social decisions with major impact. It’s important that those who create the models understand these implications. And it’s important that the models themselves can be audited so that we can understand how and why the decisions were made, and have the transparency and interpretability that are key to public sector decision-making. My approach is interdisciplinary and intersectional – I push for people to work with social scientists, legal scholars, and the communities that we’re deploying AI in. All of these stakeholders should be involved in the conversation.”

In 2020, in response to the COVID-19 pandemic, and in an effort to put theory into practice, Thais founded Community Insight and Impact, a non-profit organization whose mission is to empower communities through equitable data analytics. Their kick-off project, the COVID-19 Community Vulnerability Index, shows data on community vulnerabilities by county, as a tool for more responsibly allocating healthcare resources. The project brings together data insights from diverse fields, including urban planning, public health, sociology, mental health research, and community advocacy groups, reflecting Thais’ approach to incorporating interdisciplinary input in model building.

At Columbia, Thais was drawn to the collaborations that DSI offers researchers. DSI Research Scientists are not wedded to a particular department, and are encouraged to build relationships across the university. “It helped me feel like my approach would fit in here,” Thais observed when asked about the campus community.” I feel lucky that Columbia is interested in this interdisciplinary niche.”

— Karina Alexanyan, PhD and Shane Tan