Hackathon Winners Mine Bloomberg, Walmart Data for Insights

October 11, 2016

The team sorted stocks with similar risk profiles into clusters visualized at left The hypothetical portfolio at right leverages this information to find optimal allocations for each stock (above). Though allocations vary, the risk in each category is the same (below). (Jerry Wong)

Thirty teams of students worked through the night analyzing and visualizing data at Columbia’s second-annual Data Science Student Challenge Sept. 30 – Oct. 1.

This year’s event featured data supplied by Bloomberg and Walmart, and Microsoft’s data analysis software, Cortana Intelligence.

At the end of the event, a panel of judges awarded four teams $10,000 in prizes. The winning projects are summarized below. The event was sponsored by Bloomberg, Microsoft, Walmart, Columbia Data Science Society and the Data Science Institute.

First place, $4,000 Predicting Investment Portfolio Risk When the Market Drops

The 2008 financial crisis put renewed focus on how banks and investors manage risk. Analyzing 10-years of Bloomberg financial data, the team developed a tool to help portfolio managers estimate risk in a falling stock market. They built a sorted correlation matrix and clustering models to visualize risk patterns and later combined the two in regression models to see how asset correlations change when the market falls.

To their surprise, they found that asset correlations decline for assets from different clusters, like those found in a diversified portfolio, while asset correlations increase for assets within a cluster. A lower correlation indicates that losses from one asset are being offset by the gains of another.

Top three teams from left to right: Stephanie Doctor and Rachel Zhang for second place; Jerry Wong and Puxin Xu for first place; and Pablo Vicente Juan, Jose Vicente Ruiz Cepeda and Woojin Kim for third place. (Kim Martineau)

They combined their models and results into a toolkit and with Azure’s Power BI Dashboard, visualized the clusters to make the risk distribution within one portfolio of stocks easy to see. The tool shows a cluster risk parity portfolio, with each cluster’s risk contribution as a benchmark. Regression coefficients are displayed to make it easier to estimate a portfolio’s risk when the market falls.

Team: Jerry Wong and Puxin Xu, both of Lehigh University

Second place, $3,000 Predicting a Customer’s Second-Choice Product

Anticipating what customers will pick when the item they want is sold out helps retailers decide which products to stock. Shoppers are less likely to be disappointed if they can find a substitute item.

Working with data provided by WalMart, the team built a multiclass logistic regression model to predict the most popular product at a particular store on a given day. The model analyzes each store’s features as well as the number and type of products in stock. In hypothetical situations in which a specific product is not available, the model tries to predict what customers will pick instead.

Team: Sanjmeet Abrol, Stephanie Doctor, Rachel Zhang and Amla Srivastava.

Third place, $2,000 Predicting Next-Day Stock Prices

The team analyzed 10 years of Bloomberg financial data to try and predict next-day stock performance. Using currency-exchange rates, commodity prices and other economic indicators, they developed a classification algorithm to predict whether a stock would go up or down the following day. They calculated the log-return between the stock price at the end of a trading day, and its price 30 days earlier, to predict which direction it would move the next day.

Team: Woojin Kim, Pablo Vicente Juan, Jose Vicente Ruiz Cepeda

Fourth Place, $1,000 Predicting Product Sales to Manage Inventory

The team analyzed a year’s worth of Walmart data to understand how the sale of one product influences the sale of another. Someone buying milk, for example, may be more likely to buy cereal and less likely to buy eggs. The team looked for correlations among product sales and inventory stocks. Picking the most highly correlated data, they built a model to predict the sales of individual products based on the availability of other products. The information is meant to help retailers manage inventory, especially products that may deter sales of related products.

Team: Vinayak Bakshi Vijay Balaji, Conrad De Peuter, Abhay Pawar

— Kim Martineau