Monday, December 5, 20229:00 am - 12:00 pm
The Capstone course provides a unique opportunity for students in the M.S. in Data Science program to apply their knowledge of the foundations, theory and methods of data science to address data driven problems in industry, government and the non-profit sector.
Course activities focus on a semester-length project sponsored by a local organization. The resulting projects synthesize the statistical, computational, engineering and social challenges involved in solving complex real-world problems.
The Fall 2022 Capstone course reflects enormous interest in data science, with 43 teams exhibiting at the presentation event. Join to explore the projects and meet with the participating students and mentors. Find project themes and companies below.
Location: Pulitzer Hall, Columbia Graduate School of Journalism (Joseph D. Jamail Lecture Hall) 2950 Broadway New York, NY 10027
2:00 PM: Join the event. Poster Presentations will be on view until 5:00 PM ET; guests are welcome to float in and out of the event to speak with the students and learn about their projects.
Research will be organized into several categories:
Food and beverages will be served throughout the event!
2:30 PM: Introductions from Capstone faculty. Learn more about the Capstone program and its impact across the Data Science Institute and Columbia University at large.
5:00 PM: Event ends.
P01: Entity Resolution and Data Analysis of Author Contribution Statements
View Poster
P02: Identification of Replication “Citances”
P03: Regulatory Requirements and Policy Standards and (Large-Language-Model) Benchmarking
P04: Peace Speech Project
P05: Knowledge Graph on Unstructured Data using Unsupervised Approach for Finance Domain with Natural Language Search Enablement
P06: Detection of Trust in Call Center Interactions
P07: Hierarchical Topic Modeling over Financial Documents
P08: Supervised Learning Methods for Natural Language Processing
P09: Fine-Tuned Relationship Extraction for Consumer Goods Concepts (1)
P10: Fine-Tuned Relationship Extraction for Consumer Goods Concepts (2)
P11: A Data-Driven Analysis of Socio-Economic Factors that impact Enrolment in Clinical Trials
P12: Prediction of Commercial Insurance Payments for Surgical Procedure using Machine Learning
P13: Prediction of Commercial Insurance Payments for Surgical Procedure using DataRobot
P14: Placement Optimization of EV Chargers in the US
P15: Price Optimization in Pharma through Discount Allocation via Machine Learning
P16: Extending Satellite Observations to Ocean Depths with Machine Learning
P17: Measurements on Greenland Surface Mass Loss with Predictions on Albedo via Machine Learning
P18: Are Government Broadband Internet Subsidies a Waste of Money?
P19: Evaluating the Attractiveness of a Country for Business Investment using World Bank Indicators
P20: Time Series Financial Forecasting
P21: Improving the Sales Forecasting Process by Modeling the Lifecycle Events of a Drug
P22: Renewable Energy Growth Challenge
P23: Staying Ahead of Renewable Energy Curve, Analysis on Reusable Blades
P24: RalphLauren.com Website Search – Keyword optimization
P25: Patent Data and the Evolution of Location
P26: Galaxy-by-Galaxy Emulation of Cosmo-Hydrodynamical Simulations of Galaxy Formation
P27: MUTABLE
P28: Machine Learning in Rehabilitation Robotics
P29: Fault Detection and Prognosis in Astronomical Observatory Operational Data in Chile
P30: AI and Machine Learning Project Exploring the Clinical-Genomic Correlation of Cutaneous T-Cell Lymphoma (CTCL)
P31: Early Detection of Endometriosis from Electronic Health Record Data and Claims Data
P32: Data Analysis of Single Cell RNA Sequencing for Neuropsychiatric Disorders
P33: Accelerating Drug Discovery through Active Learning-Enhanced Virtual Screening
P34: Automatic Landcover Change Detection and Classification from Satellite Images
P35: Land Cover Change Detection using Neural Network for Satellite Images
P36: Capturing Pavement Markings using Machine Learning Algorithms
P37: Radiology Report Generation Using a Multi-Modal Prototype Network
P38: Improving Speech Transcription Accuracy by Decoding Audio with Language Model on Wav2Vec2.0 Framework
P39: Using Remote-Sensing Data to Understand Characteristics of Vegetation, such as Species, Health, have several industry applications (1)
P40: Using Remote-Sensing Data to Understand Characteristics of Vegetation, such as Species, Health, have several industry applications (2)
P41: One-Shot Learning for Face Recognition
P42: Creating Multilingual Speech Emotion Recognition Systems JPMorgan (1)
P43: Creating Multilingual Speech Emotion Recognition Systems JPMorgan (2)
Sining Chen, Adjunct Professor of Industrial Engineering and Operations Research, Columbia University
Adam S. Kelleher, Adjunct Assistant Professor of Computer Science, Columbia University
Yuan (Vivian) Zhang, Department of Biostatistics, Columbia University