Social Justice

About the Focus Area

Data science has a key role to play as we build a more equitable and inclusive world.

Leaders across all sectors increasingly embrace the power of data to tackle pressing social justice challenges. From income inequality and incarceration, to immigration and beyond, data science methods are used to gain insights, to improve decision-making, and to support the creation of scalable solutions that will positively impact society.

Automated decision systems are intended to ensure fair and equitable treatment of people, i.e., decision-making that is not subject to human bias, emotion, fatigue, faults, years of experience, etc. (e.g., some judges are more lenient than others). Ironically, these very solutions use historical data (e.g., all past court judgments) and thus continue to reflect human frailties, most notably historical and systemic bias.

The effective use of data science methods to address systemic inequality and to reduce variations in judgment across humans requires more than the adoption of the right technical approaches to the right data sets about people, institutions, communities, and systems. Collaborations between data scientists and domain experts in other disciplines, particularly the social sciences, are essential to design multifaceted, human-centered, and ethical solutions and prevent or minimize biased, inappropriate, or unintended outcomes. Interdisciplinary approaches to social justice shift the very questions we ask and how we interpret the results.

Through DSI, thought leaders from across Columbia combine techniques from the growing field of data science with their own subject-matter expertise to reshape thinking and co-create practical solutions for a more just world.

For example, an interdisciplinary research team including researchers from social work, nursing, and data science are designing an AI system to detect and assess risk for child abuse and neglect, while another team with health policy, environmental health, and electrical engineering expertise create personalized behavioral interventions to improve access to health care in low-income communities. Our research scientists are also developing tools to authenticate digital media in the “fake news” era, gaining new insights on gender and racial/ethnic inequality in the labor market, and using natural language processing to help reduce racial and gender achievement gaps in STEM fields.

DSI-affiliated faculty have also developed innovative, interdisciplinary curricula to embed data science into social justice-related courses for undergraduate and graduate students, including history, social work, and public policy courses. Teams of M.S. in data science students have partnered with social work and psychology faculty to understand Twitter activity before and after police use of force against unarmed Black victims, developed a cost-benefit model to measure violence intervention efforts in Baltimore, Chicago, and New York, and defined gentrification trends and uncovered insights on its spread throughout New York City.

Related Centers

Social Justice

Data, Media and Society

Cybersecurity

Research Highlights

- Organizations focused on reducing gang violence struggle to keep up with the growing complexity of social media platforms and the sheer volume of data they present. This research describes the Digital Urban Violence Analysis Approach, a qualitative analysis method used in a collaboration between data scientists and social work researchers. For decoding the high-stress language of urban, gang-involved youth. Applying natural language processing techniques, this team created automated tools with the potential to detect aggressive language on social media and aid individuals and groups in performing violence prevention and interruption.
- Desmond Patton, Social Work
- Kathleen McKeown, Computer Science
- Owen Rambow, Computational Learning Systems
- Jamie Macbeth, SAFELab
- Personalized approaches to behavioral interventions, known as nudges, may improve access to health care in low-income communities. Using health, environment, transportation, and financial data, this project is building smart nudges that adapt to individual needs by using innovative methods in machine learning and data science.
- John Paisley, Electrical Engineering
- Kai Ruggeri, Health Policy and Management
- Marianthi-Anna Kioumourtzoglou, Environmental Health Sciences
- Child abuse and neglect is a social problem that has reached epidemic proportions. The broad adoption of electronic health records in clinical settings offers a new avenue for addressing this epidemic. This team is developing an innovative artificial intelligence system to detect and assess risk for child abuse and neglect within hospital settings that prioritizes the prevention and reduction of bias against Black and Latinx communities.
- Maxim Topaz, Nursing
- Aviv Landau, Data Science Institute
- Desmond Patton, Social Work
- This research team combines new sources of labor market data, including online resumes and employee reviews, with data science methods to identify factors and environments that shape gender and racial inequality in the high-skilled labor market. The team is charting long-term career trajectories of a large number of high-skilled American workers, examined gender and racial variations, constructing measures of company environment that pertain to gender and racial equity, and assessing consequences for the career path of different groups of skilled workers.
- Kriste Krstovski, Data Science Institute, Business
- Yao Lu, Sociology
Disaster events often occur in remote, hard-to-access regions, with conditions made more difficult by the evolving crisis. Groups such as International Organization for Migration and International Displacement Monitoring Centre rely on field-based estimates from humanitarian groups, media reports, and in rare instances, survey data to get a handle on the numbers and characteristics of those displaced for planning humanitarian assistance. However, data are often delayed and fragmentary and require triangulation, affecting decision making and the ability to track trends in displacement as well as return or local integration. This research is processing data from multiple sources to provide a synoptic view of disaster displacement globally, broken down by type and location.
- Robert S. Chen, The Earth Institute
- This team hypothesizes that broad accessibility to ultrasound imaging can revolutionize health care in developing nations, and generate data necessary for training AI-based mobile diagnostic tools. These tools could help detect and diagnose numerous medical conditions from premature labor to cardiovascular and orthopedic malignancies. The team is developing a handheld, USB-powered or battery-powered device that generates a 3D sonogram and streams images in the same format as webcam video, which can be displayed or processed by diagnostic software. They aim to ensure all parts and components may be obtained for less than $100.
- Hod Lipson, Mechanical Engineering
- This team explores what a user-centric framework for machine learning (ML) might look like. Their research puts ML systems in historical and current sociopolitical contexts. They are defining what decentralized and “deliberatively democratized” ML entails; identifying a set of use cases and scenarios to see how democratized ML can be deployed with a critical focus on preserving the rights and liberties of marginalized communities; and mapping end-user scenarios to active research and implementation tools.
- Josh Whitford, Sociology
- Kiran Samuel, Sociology Doctoral Student
- AI-empowered systems are performing at or beyond human-level capability and are being used to help make decisions that have life-changing consequences for individuals and society through a multitude of applications: hiring, admissions, policing, recidivism, bank loans, self-driving cars, and medical treatment. This research effort aims to specify and verify machine learned models for trust properties, including fairness, robustness, and privacy, using formal methods. Formal methods provide provable guarantees that a property holds for a class of related behaviors, obviating the need to painstakingly test each behavior one at a time.
- Shipra Agrawal, Industrial Engineering and Operations Research
- Roxana Geambasu, Computer Science
- Daniel Hsu, Computer Science
- Suman Jana, Computer Science
- Carl Vondrick, Computer Science
- Jeannette M. Wing, Data Science Institute, Computer Science
- Junfeng Yang, Computer Science
- The social web is driven by feedback mechanisms (“likes”) that emotionalize the sharing culture and may contribute to the formation of echo chambers and political polarization. This team is building a complementary mechanism for web-based sharing of reasoned judgments. This mechanism performs probabilistic inference on contentious claims with machine learning algorithms with the aim of bringing rationality to the social web.
- Chris Wiggins, Applied Physics, Applied Mathematics
- Nikolaus Kriegeskorte, Psychology, Zuckerman Institute
- Nima Mesgarani, Electrical Engineering
- Fact-checking is a journalistic practice that compares a claim made publicly against trusted sources of facts. This project extends the POLITIFACT.com LIAR dataset by automatically extracting the justification from the fact-checking article used by humans to label a given claim. The research shows that modeling the extracted justification in conjunction with the claim (and metadata) provides a significant improvement regardless of the machine learning model used (feature-based or deep learning) both in a binary classification task (true, false) and in a six-way classification task (pants on fire, false, mostly false, half true, mostly true, true).
- Smaranda Muresan, Data Science Institute
- This work strengthens digital media security by developing tools for authenticating the source of digital media, verifying its publication date and revision history, and integrity-checking specific content elements. The project’s approach is to design, test, and implement novel cryptographic tools to secure modern digital publication systems, via new user interface elements that convey the authenticity and integrity guarantees these tools support. The team is creating tools for online publishers (e.g., news outlets, government agencies) to mitigate misinformation threats and increase public trust, thus providing solutions to problems in information authenticity, applied cryptography, and human-computer interaction.
- Susan McGregor, Data Science Institute
- Reducing the achievement gaps in STEM disciplines among subpopulations of students is important for the U.S. to meet its 21st century science and technology needs. This project focuses on environmental factors that devalue, marginalize, or discriminate against students based on a social identity like race, gender, disability status, or socioeconomic status. To date, the research synthesized and systematically analyzed data from interventions shown to help reduce the impact of social identity threats on student participation in STEM, and applied results of the synthesis and analyses to enhance existing interventions.
- Smaranda Muresan, Data Science Institute
- This course introduces students from the School of Engineering and Applied Sciences and Columbia College to understand our civilization of data and to be critical and effective participants in it. It assumes no prerequisites and is open to students at all levels, including first year students. Materials combine traditional “functional literacy” materials with material to sharpen critical and rhetorical literacy and understanding of the context and assumptions underlying data-driven experiences and narratives, and how to integrate analysis and modeling of data as part of contemporary discourse.
- Chris Wiggins, Applied Physics, Applied Mathematics
- Matthew Jones, History
- The divide between social workers and data, as well as between data scientists and work for social good, hinders our ability to bridge the gap between the fields and inform interdisciplinary methodologies for developing and creating interventions for social problems. This course prepares and generates a new cadre of interdisciplinary social work and data science students who learn to effectively engage in interdisciplinary collaboration utilizing computational skills and deep contextual knowledge of the conditions and factors related to our most pressing social problems.
- Desmond Patton, Social Work
- Tian Zheng, Statistics
The course introduces School of International and Public Affairs students to computational thinking, including Python programming, and teaches students to apply that way of thinking to public policy issues. Course participants have applied this new knowledge for their capstone projects, during which they address real world policy and management challenges for external clients.
- Merit Janow, International and Public Affairs
- Dan McIntyre, International and Public Affairs
- This program adds ethical teaching to current undergraduate computer science courses and is compiling these modules into a textbook. The initial set of courses augmented by ethics modules included Machine Learning, Human Computer Interaction, and Networks and Crowds.
- Augustin Chaintreau, Computer Science
This series of one-hour talks is open to the entire Columbia community and features distinguished speakers who are grappling with the challenge of ensuring that data science serves the public good. Topics include financial systems risk, interpretability and discrimination in machine learning, different definitions of fairness and privacy, and equitable access to digital technology.
- This team applied topic modeling and sentiment analysis techniques to Twitter activity before and after events related to police use of force against unarmed Black victims. They focused on more than 8.5 million tweets from August 2014 to cover the event of a police officer (Darren Wilson) shooting a Black victim (Michael Brown) in Ferguson, MO. Their study demonstrated that there are lags between events and their emotional response on Twitter.
- Wayne Leach, Psychology, Barnard College
- Courtney Cogburn, Social Work
- This team was tasked with defining gentrification trends through publicly available datasets that uncover insights on the spread of gentrification across 55 Public Use Microdata Areas (PUMA) districts in New York City. The four most important factors were: real estate price, race of inhabitants, employment, and education.
- Hardeep Johar, Industrial Engineering and Operations Research
- Patrice Derrington, Architecture, Planning and Preservation
- This team built text-based predictive methods to assess the trustworthiness of news articles and to study the relationship between story trustworthiness and the reactions that the stories elicit through social media.
- Sarah Ita Levitan, Computer Science Postdoctoral Research Scientist
- This team combined gun-crime and census data for New York, Chicago, and Baltimore to understand the interplay between crime, poverty, and other factors. The interactive tool they built lets users explore the data and put several trends in sharp focus: Chicago gun-crime peaks annually in July and tapers off in February, New York had the lowest gun-crime rates of the three cities, and gun-crime in Baltimore has increased.
Desmond Patton, Social Work
- Owen Rambow, Computational Learning Systems
- This team developed a cost-benefit model to measure the effects of Cure Violence’s intervention efforts in Baltimore, Chicago, and New York. They found that for each murder averted, governments would save $440,000, mostly by avoiding the cost of imprisoning someone for life. The team also developed a heat map to track gun-violence incidents in Baltimore, Chicago, and New York, using open-source data and code to help prioritize target areas.
- Desmond Patton, Social Work
- Owen Rambow, Computational Learning Systems