Thirty teams of students worked through the night analyzing and visualizing data at Columbia’s second-annual Data Science Student Challenge Sept. 30 – Oct. 1.
This year’s event featured data supplied by Bloomberg and Walmart, and Microsoft’s data analysis software, Cortana Intelligence.
At the end of the event, a panel of judges awarded four teams $10,000 in prizes. The winning projects are summarized below. The event was sponsored by Bloomberg, Microsoft, Walmart, Columbia Data Science Society and the Data Science Institute.
First place, $4,000 Predicting Investment Portfolio Risk When the Market Drops
The 2008 financial crisis put renewed focus on how banks and investors manage risk. Analyzing 10-years of Bloomberg financial data, the team developed a tool to help portfolio managers estimate risk in a falling stock market. They built a sorted correlation matrix and clustering models to visualize risk patterns and later combined the two in regression models to see how asset correlations change when the market falls.
To their surprise, they found that asset correlations decline for assets from different clusters, like those found in a diversified portfolio, while asset correlations increase for assets within a cluster. A lower correlation indicates that losses from one asset are being offset by the gains of another.
They combined their models and results into a toolkit and with Azure’s Power BI Dashboard, visualized the clusters to make the risk distribution within one portfolio of stocks easy to see. The tool shows a cluster risk parity portfolio, with each cluster’s risk contribution as a benchmark. Regression coefficients are displayed to make it easier to estimate a portfolio’s risk when the market falls.
Team: Jerry Wong and Puxin Xu, both of Lehigh University
Second place, $3,000 Predicting a Customer’s Second-Choice Product
Anticipating what customers will pick when the item they want is sold out helps retailers decide which products to stock. Shoppers are less likely to be disappointed if they can find a substitute item.
Working with data provided by WalMart, the team built a multiclass logistic regression model to predict the most popular product at a particular store on a given day. The model analyzes each store’s features as well as the number and type of products in stock. In hypothetical situations in which a specific product is not available, the model tries to predict what customers will pick instead.
Team: Sanjmeet Abrol, Stephanie Doctor, Rachel Zhang and Amla Srivastava.
Third place, $2,000 Predicting Next-Day Stock Prices
The team analyzed 10 years of Bloomberg financial data to try and predict next-day stock performance. Using currency-exchange rates, commodity prices and other economic indicators, they developed a classification algorithm to predict whether a stock would go up or down the following day. They calculated the log-return between the stock price at the end of a trading day, and its price 30 days earlier, to predict which direction it would move the next day.
Team: Woojin Kim, Pablo Vicente Juan, Jose Vicente Ruiz Cepeda
Fourth Place, $1,000 Predicting Product Sales to Manage Inventory
The team analyzed a year’s worth of Walmart data to understand how the sale of one product influences the sale of another. Someone buying milk, for example, may be more likely to buy cereal and less likely to buy eggs. The team looked for correlations among product sales and inventory stocks. Picking the most highly correlated data, they built a model to predict the sales of individual products based on the availability of other products. The information is meant to help retailers manage inventory, especially products that may deter sales of related products.
Team: Vinayak Bakshi Vijay Balaji, Conrad De Peuter, Abhay Pawar
— Kim Martineau