Explore cross-disciplinary data science research projects from 10 Columbia student team finalists as they compete for awards.


March 5, 2021 (11:00 AM – 1:00 PM ET) – Online Event
View Presentations Below

Hosted By

DSI Education Working GroupLearn More

About the Competition

The student teams that presented during this inaugural competition event were selected as finalists by the DSI Education Working Group committee from among many impressive nominations. Following the presentations, a panel of esteemed faculty from across Columbia’s campuses served as judges. Finalists competed across several judging criteria, such as creative use of data science; potential societal impact; and alignment with DSI’s “Data for Good” mission.

This event provided a snapshot of how Columbia students are applying data science across many disciplines including computer science, electrical engineering, statistics, public health, comparative literature, and more. Successful student teamwork, problem solving, and innovation in data science methods was highlighted.

This event was chaired and moderated by Tian Zheng, Professor and Department Chair of Statistics, Columbia University; and Co-Chair of the DSI Education Working Group.

Columbia University Faculty Judges

The 5 Winning Teams

The Columbia Language Justice Perspectives Project

  • Team Members: Nikita Desir; Kyra Ann Dawkins
  • Course: Multilingual Technologies and Language Diversity
  • Instructor: Smaranda Muresan and Isabelle Zaugg 

R Story – Empowering Communities for Resilience and Sustainable Growth, in partnership with the Opportunity Project (a Census Bureau initiative)

  • Team Members: Asahi Alexis Nino; Michelle A. Zee; Kyung Suk Lee; Gretchen Streett; Alisha Gurnani; Alison Ryland
  • Course: Practicum in Data Analysis
  • Instructor: Aracelis Torres

Network Characterization of Phishing Attacks

  • Team Members: Elisa Luo; Liane Young
  • Course: Topics in Information Processing
  • Instructors: Asaf Cidon and Ethan Katz-Bassett

Estimating the Incidence of Sexual Assault on College Campuses

  • Individual Contribution: Casey Bradshaw
  • Course: Foundations of Graphical Models
  • Instructor: David Blei

Intelligent Forecasting for COVID-19 in Collaboration with KPMG

  • Team Members: Bolim (Sydney) Son; Andrew Thvedt; Louisa Ong; Zhirui (Ariel) Luo; Jiyeon (Jen) Woo
  • Course: Practicum in Data Analysis
  • Instructor: Aracelis Torres

Project Descriptions & Presentations

Attacks on Aid: Quantifying Risk of Violence Toward Aid Workers in Global Humanitarian Settings

Attacks on aid workers in complex crises have been steadily increasing, with a record number of workers killed, kidnapped, or wounded in recent years. Through interactive data visualization and a case study on Afghanistan, our website illustrates the risks of being an aid worker in conflict settings. In characterizing violence against aid workers, our project is a spotlight that equips humanitarian organizations to better protect their staff. As is commonly stated by humanitarians, saving lives should never cost lives. 

Team Members: Natalie Boychuk; Alisha Sarakki; Kailey Rishvod; Emily Bamforth; Brennan Bollman

2020 US Presidential Election Exploration in R

The 2020 U.S. Presidential Election was designated to be eventful under the COVID-19 pandemic. The surge in mail-in ballots aroused considerable controversy over election integrity, but were the voting results really unexpected or even fradulent as Trump claimed? Many analyses have discussed how the American economic conditions, the racial unrest, and Trump’s COVID-19 response have impacted his odds being reelected, our project, however, took another perspective to examine the finances of the presidential campaigns and delivered new insights into how effective the public fundraising and expenditures were in explaining why Trump lost the Election 2020.

Team Members: Jin Qian; Wenjie Zhu; Yibai Liu

NYC Employment Analysis 

NYC might just be the city with the richest employment landscape in the world. From blue collar workers to big corporation executives, and strong representation from almost every ethnicity, New York has it all. Take a brief walk with Xinyi and Eugenio as they explain to you their exploratory data analysis and visualization project about the capital of the world, and learn about the potential for good that inspecting the intersections of race, gender, and working sector may bring to us.

Team Members: Xinyi Liu; Eugenio Beaufrand

R Story – Empowering Communities for Resilience and Sustainable Growth, in partnership with the Opportunity Project (a Census Bureau initiative)

Your community has a story – feel supported when telling it. Through R Story’s interactive dashboard, rural leaders can easily access their community’s data in a format that is ready to present to entrepreneurs, developers, and future residents looking for their next location. With R Story, community leaders can leverage their data to obtain resources for their community and sustainably build their economy.

Team Members: Asahi Alexis Nino; Michelle A. Zee; Kyung Suk Lee; Gretchen Streett; Alisha Gurnani; Alison Ryland

Intelligent Forecasting for COVID-19 in Collaboration with KPMG

During such an unpredictable time when COVID-19 has impacted our lives, this project aims to help circumvent this instability by providing predictions of COVID-19 deaths and individual activity across the U.S. Through our interactive website, business and all users can explore and understand the historical and predicted data of this pandemic by city and state to ensure an efficient and safe reopening for all.

Team Members: Bolim (Sydney) Son; Andrew Thvedt; Louisa Ong; Zhirui (Ariel) Luo; Jiyeon (Jen) Woo

The Columbia Language Justice Perspectives Project

The act of translation in all forms is one of beauty and tension, but machine translation is a true double-edged sword that can either protect or endanger digital multilingual experiences in the fight for language justice. The Columbia Language Justice Perspectives Project presents these nuanced global stakes of equity in translation through an interactive map that contextualizes multilingual reflections and indicates the discrepancies in Google Translate technology. Considerations of language justice and translation should be engaging and accessible; however, they also should hold machine translation technologies accountable to meeting the needs of language communities. 

Team Members: Nikita Desir; Kyra Ann Dawkins

Characterizing Walt Whitman’s Stylistic Changes in Leaves of Grass 

Walt Whitman is an iconic American poet famous for his poetry collection Leaves of Grass. It is well established by literary scholars that his style changed significantly throughout his 40 years long writing career. Applying appropriate quantitative tools (from natural language processing) to various editions of Leaves of Grass, we obtain impressive results regarding Walt Whitman’s stylistic changes in many respects. These tools extend easily to a large number of literary works and help people form a better understanding of literary works they read.

Individual Contribution: Jieyan Zhu

Perovskite Stability Prediction

In recent years, countless researchers have explored a wide range of promising technical solutions and policy options to lower the environmental cost of energy produced from the sun, wind, biomass, and geothermal resources. Among all the research areas, the discovery and deployment of novel materials are critical to scaling any sustainable energy endeavor, such as converting sunlight to energy or making batteries to store such energy. Built on the success of previous research, our project, Perovskite Stability Prediction, aims to use data and machine learning techniques to better identify suitable perovskite materials for sustainable energy applications. 

Team Members: Caroline Rutherford; William Yu; Yiming Huang; Jiaying Chen; Seung-Jae Bang; Weixi Yao

Network Characterization of Phishing Attacks

Phishing attacks are one of the most widespread and persistent threats to cybersecurity. By impersonating trustworthy entities, attackers trick victims into disclosing sensitive information such as passwords and credit card information. Current methods to deal with phishing are well-known to most attackers and can be easily bypassed. Identifying and understanding network-level characteristics of phishing emails is key to modernized cybersecurity, providing a novel and more robust method of preventing phishing emails from reaching vulnerable targets.

Team Members: Elisa Luo; Liane Young

Estimating the Incidence of Sexual Assault on College Campuses

Each year, US colleges and universities are required to disclose the number of reported sexual assaults on their campuses. However, sexual assault is widely believed to be underreported, and the number of reported assaults could arise from any combination of reporting rate and true total number of assaults. This project aims to disentangle those two values, allowing plausible estimates of the total number of assaults occurring in a given year. Such estimates improve the interpretability of campus crime statistics, and may help inform policy decisions regarding campus initiatives for sexual assault prevention and awareness.

Individual Contribution: Casey Bradshaw