Data Science Institute Students Conduct Real-World Data Science Research
In their final semester of the master’s program, DSI master’s students take a course that gives them real-world experience and helps them prepare for their careers – may it be in industry, government or academia.
For the Capstone & Ethics course, the students divide into small teams to work on data-science challenges, some of which are provided to them by DSI’s industry affiliates and Columbia faculty. The teams work under the direction of their mentors, who guide them through the entire cycle of how to use data science to solve challenges facing their organizations. The students work on their projects over the course of the semester, at the end of which they present their findings to their professors, affiliates and fellow students.
The Moelis team’s Capstone, “Future Profit Warning Classification Using Earnings Call Transcript,” was a three-part project that leveraged sentiment analysis of news and machine learning techniques to design predictive models of stock-price performance, profit warning and project revenues.
Team Neoway’s project, “Classifying Food and Beverage Establishments from Website Data,” built a machine learning pipeline that classified a massive number of food and beverage operator websites that significantly improved the company's current practice.
Team KPMG’s project, “Algorithmic Comment Processing,” involved identifying and summarizing sections in PDF documents for KPMG’s client, regulations.gov.
And for its project, “Rich and Cheap Bond Recommendation,” team Vanguard developed a prototype of a front-end platform that recommends bonds and helps portfolio managers make more informed decisions.
The four projects offered invaluable opportunities for the students to get an insider's look at the actual and latest data science problems facing companies, said Professor Sining Chen, Adjunct Professor of Industrial Engineering and Operations Research, who taught the spring 2019 Capstone class.
“The students had a chance to learn and apply a wide range of tools from data scraping to data visualization, to NLP and state-of-the-art machine learning algorithms,” added Chen. “They learned through trial and error ‒ the best way to learn – having to come up with their own creative approaches. The projects helped them develop not only their data science skills but also their critical thinking, communication, writing and presentation skills as well as management and collaborative abilities, all of which made them more well-rounded and even more competitive in today’s job market. And the affiliates were tremendous in proposing topics that were cutting-edge and real yet appropriate for training purposes.”
For their part, DSI students say the Capstone course helped them demonstrate and sharpen their data-science skills – experience that will serve them well when they later interview for jobs or apply to doctoral programs. As it is, DSI students are sought after, but the hands-on experience of the Capstone & Ethics course enhances their marketability since they can discuss their projects during interviews with recruiters and admissions officials.
Pranjal Bajaj, a 2019 graduate who was part of the KPMG team, characterized the Capstone experience as a “highly rewarding hands-on challenge.”
“Working on the Capstone project provided us a deep insight into what it’s like to work as a professional data scientist,” Bajaj said. “We started from scratch, scraping the data, spending weeks pre-processing it before building and fine-tuning our models to ensure that they serve the end user and create business impact,” added Bajaj, who will begin working this summer as a data scientist at Boston Consulting Group (BCG) GAMMA. “This experience also helped me during the job interview process, where I discussed my Capstone project.”
What follows are slide presentations from each of the four Capstone teams – presentations that detail the goals, methods and findings of each project:
Future Profit Warning Classification Using Earnings Call Transcript:
Classifying Food and Beverage Establishments from Website Data:
Algorithmic Comment Processing:
Rich and Cheap Bond Recommendation: