Nearly 300 Students Compete in Data Science Hackathon


Three teams won thousands of dollars during the 2017 Columbia Data Science Hackathon, a 21-hour long competition held recently at Columbia University.

Nearly 300 Columbia students worked long into the night, using advanced modeling and visualization techniques to analyze and interpret data and communicate their findings. The students formed teams and analyzed one of three data sets provided by corporate sponsors: one from Enigma, a data management company; another from Bloomberg; and a third from Digital Reasoning, a cognitive computing company.  The next morning, 39 teams presented their projects to a panel of 12 judges, which picked three winning teams.

1st Prize, $3000 - Unusual Suspects

First prize went to a team who called themselves the Unusual Suspects. They analyzed emails, supplied by Digital Reasoning, from the Enron Corporation, an energy company driven to bankruptcy due to scandal in 2001. The team was comprised of three master’s students in the Data Science Institute (DSI): Moorissa Tjokro, Arman Uygur, and Jonathan Galsurkar.

The team identified Enron's guilty employees using semantic similarities through a deep-learning-based document embedding methodology.

“We developed a program that can infer a score for the potential guilt (or innocence) of an employee, not previously found guilty,” says Galsurkar. The team took all of the emails that non-guilty Enron employees had sent and compared their semantic quality with those found guilty.

“A find that validated our results were that the emails of many guilty employees were in fact semantically similar, indicating that semantic similarity can be a good potential measure of guilt,” adds Galsurkar.

2nd Prize, $2,000 - Noob Network

The second prize of $2000 was won by the team Noob Network, which used Network-Based Analysis and Natural Language Processing to recognize Enron employees who were potentially involved in unethical and fraudulent behavior.

On the team were DSI students Adarsh Chavakula, Gaurav Singh, Somya Singhal, and Vinay Kale.

3rd Prize, $1,000 - The Hedger

And third place went to The Hedgers, who analyzed government contract data supplied by Enigma. They made use of machine learning to recognize causal relationships between commodity spot returns and the types of government spendings. Based on the identified interrelationship trends, the team eventually developed trend-following strategies that enable investment managers to outperform markets by using government spending as predictive features for commodity returns.

The hackathon was hosted by the Columbia Data Science Society and sponsored by Google Cloud Platform, Bloomberg, NBC Universal, Digital Reasoning, Honeywell, TwoSigma, Facebook, and Enigma.

--By Robert Florida

550 W. 120th St., Northwest Corner 1401, New York, NY 10027    212-854-5660
©2018 Columbia University