Scholars Program Projects

  • Breaking the neural code of a cnidarian

    Breaking the neural code of a cnidarian

    Analysis of the relation between the activity of all the neurons in the cnidarian hydra and its behavioral repertoire.

  • Technology transfer insights from data

    Technology transfer insights from data

    CTV’s core mission is to facilitate the transfer of inventions from academic labs to the market for the benefit of society. In a typical year, CTV receives ~400 inventions, completes ~100 licenses and options, and helps form ~20 startups. A good video summary of CTV is here:

  • Project ToxicDocs

    Project ToxicDocs

    We house the world’s largest dataset of once-secret documents on industrial pollution, unleashed from the vaults of corporations like DuPont, Dow, and Monsanto in toxic tort litigation. We are applying data science methods to analyze and render this material useable to a broad audience.

  • A longitudinal network study of Alzheimer’s and dementia care in relation to disparities and outcomes

    A longitudinal network study of Alzheimer’s and dementia care in relation to disparities and outcomes

    Alzheimer’s disease and related dementia (AD/dementia) represent a looming public health crisis, affecting roughly 5 million people in the U.S. and 11% of older adults. As with other chronic conditions, racial/ethnic and socio-economic disparities exist in the prevalence and burden of illness. However, less is known about how disparities in access to care influence the care trajectories – i.e., the scope, frequency and sequence of services used across healthcare settings – of those with AD/dementia.

  • Data Science and the regulation of financial markets (application closed)

    Data Science and the regulation of financial markets (application closed)

    The development of computational data science techniques in natural language processing (NLP) and machine learning (ML) algorithms to analyze large and complex textual information opens new avenues to study intricate processes, such as government regulation of financial markets, at a scale unimaginable even a few years ago. This project develops scalable NLP and ML algorithms (classification, clustering and ranking methods) that automatically classify laws into various codes/labels, rank feature sets based on use case, and induce best structured representation of sentences for various types of computational analysis.

  • Enhancing self-directed learning opportunities

    Enhancing self-directed learning opportunities

    Analyze data from one of the following library applications/systems and create visualizations that highlight the most important findings pertaining to the support of self-directed learning: Vialogues (TC Video Discussion Application), PocketKnowledge (TC Online Archive), DocDel (E-Reserve System), Pressible (Blogging Platform), Library Website and Mobile App.

  • Genomic and environmental predictor of preterm birth

    Genomic and environmental predictor of preterm birth

    Predicting preterm birth in nulliparous women is challenging and our efforts to develop predictors for that condition from environmental variables produce insufficient classifier accuracy. Recent studies highlight the involvement of common genetic variants in length of pregnancy. This project involves the development of a risk score for preterm birth based on both genetic and environmental attributes.

  • Global Interconnections Project (application closed)

    Global Interconnections Project (application closed)

    Understand interconnected nature of global multi-national companies via their supply chain, product and services competition, co-investments and co-ownerships as well as other dependencies between operations and revenue streams. We would like to consider the way news on any company specifically propagate down the connection graph and impact other businesses that are related in a way that is not necessarily explicit.

  • Internships in DSI Center for Data, Media & Society

    Internships in DSI Center for Data, Media & Society

    The DSI Center for Data, Media & Society is seeking undergraduate and masters students during the summer to work on projects at the intersection of Computer Science, Data Science, and the humanities. These projects will combine domain expertise in the humanities with computer and data science techniques to tackle important societal and media problems. Projects can vary from documenting human rights violations, providing rural farmers with financial safety-nets, analyzing the sources of social media popularity, and more!

  • Modeling Genomic Evolution with Machine Learning

    Modeling Genomic Evolution with Machine Learning

    A Fall 2018 internship is available in the Eaton lab to work on the development and application of machine learning approaches to historical evolutionary inference. Research will involve learning to use high performance distributed computing infrastructure, performing population genetic simulations, fitting machine learning models, and writing reproducible shareable code. The ideal candidate will have experience and interest in Python coding and a reasonable understanding of linear algebra.

  • Neuronal Ensemble Detection with Temporal CRF (application closed)

    Neuronal Ensemble Detection with Temporal CRF (application closed)

    Given calcium imaging data of active neurons, can we detect groups of co-firing neurons, called neuronal ensembles? We have a number of datasets consisting of hundreds of neurons imaged for thousands of time steps, and seek to extend an existing CRF model to consider temporal relationships. The goal is to be able to detect neuronal ensembles that span multiple time steps, and that are not conditioned on external stimuli.

  • Real time observation and navigation of multitude of autonomous cars, in dense urban traffic intersections with many pedestrians

    Real time observation and navigation of multitude of autonomous cars, in dense urban traffic intersections with many pedestrians

    Project components: (i) Monitoring of traffic intersections, using bird’s eye cameras, supported by ultra-low latency computational/communications hubs; (ii) Simultaneous video-based tracking of cars and pedestrians, and prediction of movement based on long-term observations of the intersection; (iii) Real-time computational processing, using deep learning, utilizing GPUs, in support of ii; (iv) Sub-10ms latency communication between all vehicles and the computational/communication hub, to be used in support of autonomous vehicle navigation.

  • Real-time brain state classification

    Real-time brain state classification

    Using machine learning to conduct brain state classification at real-time on EEG/fNIRS/fMRI data.

  • Genomic and environmental predictor of preterm birth

    Genomic and environmental predictor of preterm birth

    Networked systems are ubiquitous in modern society. In a dynamic social or biological environment, the interactions among subjects can undergo large and systematic changes. Due to the rapid advancement of technology, a lot of social networks are observed with time information. Some examples include the email communication network between users, comments on Facebook, the retweet activities on Twitter, etc. We aim to propose new statistical models and associated methodologies for various problems including community detection, change point detection and behavior prediction. The proposed methods will be evaluated on a wide range of network datasets in different areas.

  • Global Interconnections Project (application closed)

    Global Interconnections Project (application closed)

    DNA sequence reads from a community of microbial genomes are currently processed without considering sequence variants. The project involves building a processing pipeline of such billions of short reads, identifying closest strains they might belong to, assembling them into specific clones, calling their variants, and analyzing the dynamic nature of these bacterial strains along sampling points.

  • Internships in DSI Center for Data, Media & Society

    Internships in DSI Center for Data, Media & Society

    Recently Columbia University, Cornell, and NewYork-Presbyterian have agreed to integrate their clinical (healthcare) and business IT systems onto one shared platform called Epic. The motivating factors to move to Epic are to enhance the patient experience, improve and integrate care, and give our physicians an integrated technology platform that supports the mission of an academic medical center. The intern will assist with developing the “operational” analytics capabilities of Columbia University Medical Center including financial, healthcare operations and healthcare quality analytics.

  • Modeling Genomic Evolution with Machine Learning

    Modeling Genomic Evolution with Machine Learning

    Microelectrode array recordings from patients undergoing surgical evaluation have captured typical clinical seizures. Because of the extreme pathological conditions at these times, identifying single units from extracellular data is a particular challenge. Our group has developed techniques for tracking neurons through the ictal transition. We are applying them to newly acquired data and addressing fundamental questions about the activity of different cell classes at seizure initiation.

  • Neuronal Ensemble Detection with Temporal CRF (application closed)

    Neuronal Ensemble Detection with Temporal CRF (application closed)

    The quality of biomedical evidence can affect research sustainability, patient safety, and the public’s trust in biomedical research. However, often the quality of biomedical evidence remains opaque to the public. It is imperative to improve the transparency of evidence quality. This project aims to leverage the public data sources, including but not limited to The ClinicalTrials.gov, The PubMed database for biomedical literature, The National Health and Nutrition Examination Survey (NHANES) database, and so on, to develop and apply novel data mining and visualization methods for appraising the biomedical research evidence, uncovering implicit biases in clinical research designs at different levels, and presenting this information intuitively to the public. Students on this project will acquire or hone their skills in data mining, results presentation, and user interface designs and evaluation.

  • Real time observation and navigation of multitude of autonomous cars, in dense urban traffic intersections with many pedestrians

    Real time observation and navigation of multitude of autonomous cars, in dense urban traffic intersections with many pedestrians

    We are collecting and analyzing survey data asking people about the political attitudes and other characteristics of their family, friends, and others in their social circles. Some of this work is described here and we are also doing polling relevant to the 2018 midterm elections.

  • Real-time brain state classification

    Real-time brain state classification

    Robotic grasp planning based on raw sensory data is difficult due to occlusion and incomplete scene geometry. Often one sensory modality does not provide enough context to enable reliable planning. A single depth sensor image cannot provide information about occluded regions of an object, and tactile information is incredibly sparse spatially. We are building a Deep Learning CNN that combines both 3D vision and tactile information to perform shape completion of an object seen from a single view only, and plan stable grasps on these completed models.

Learn more about the DSI Scholars Program


550 W. 120th St., Northwest Corner 1401, New York, NY 10027    212-854-5660
©2018 Columbia University