Data Science Interdisciplinary ROADS Grant - Opportunities for Collaboration


Data Science Interdisciplinary ROADS Grant - Opportunities for Collaboration


The Institute is pleased to announce our 3rd call for proposals from Columbia University faculty and research staff.  Aimed at advancing research that combines data science expertise with domain expertise, the ROADS Provost Ignition grant is intended to assist researchers who are interested in this theme to come together. We are particularly drawn to faculty teams whose proposed project will enable them to develop successful proposals for large scale grants.  We will look for applications that propose unique and novel approaches to bring scholars together to work on projects that cross traditional discipline boundaries. In essence, we are looking for methods that are not just "business as usual."

Below you will find a  roster of individuals who have previously expressed interest in collaborating. If you would like to be listed here, please use the following link.

As we get additional requests, we will periodically update this roster.

 

Click on each person's name for more information about their interest area as well as their contact information.

Hide/Show

Mitchell Elkind - Professor of Neurology and Epidemiology

I am interested in collaborating with a data scientist/engineer to explore the use of machine learning in analysis of continuous cardiac telemetry data to identify patterns of cardiac electrical activity (electrocardiography, EKG) that predict the appearance of atrial fibrillation. Atrial fibrillation (AF) is a very common cardiac arrhythmia that is a major risk factor for stroke. When identified it is treated with strong blood thinners (anticoagulants) to prevent stroke. The problem is that many patients have short, asymptomatic episodes of AF that go undetected, so long-term (30 days with wearable devices up to 3 years with implanted devices) monitoring is frequently done to detect the abnormal rhythm, which may occur only at the end of a long period of monitoring. My hypothesis is that machine learning could help identify patterns of electrocardiographic activity, before AF occurs (for example, during the first week of wearing a monitor), that will predict those patients who develop AF later (e.g., during the third week of wearing it). These features could include combinations of some or all of the following EKG characteristics: P wave characteristics, characteristics of other waves, frequency of premature atrial contractions, occurrence of supraventricular arrhythmias, frequency and duration of these episodes waves, etc. The ability to detect AF early and thus initiate treatment sooner would have a major impact on patient diagnosis and treatment. I am looking for a data engineer with expertise in the following areas: machine learning, analysis of large datasets, ideally some experience with medical diagnostic testing, electrocardiographic analysis. I have consulted for a company that performs this outpatient monitoring throughout the US, and they have expressed interest in collaborating on this. They monitor, using remote monitoring, over 1 billion heartbeats daily. Most patients undergo monitoring for thirty days. this would provide a very rich dataset for this derivation analysis and validation.
Webpage
Contact

Andrew Gelman - Professor of Statistics and Political Science

We are working on development of models, algorithms, and software for Bayesian statistics. Applied problem with need for quantification of uncertainty, or expertise in probabilistic programming. We are particularly interested in large problems for which sophisticated statistical models are needed, thus providing serious computational challenges which motivates our research and software development. Our open-source Bayesian inference program Stan has been successful but we seek new sources of funding, possibly through health sciences and engineering. We believe that, with the right collaborators, there is the possibility of much more funding and development through collaboration with large research projects with need for statistical inference with big data. Webpage
Contact

Einat Lev - Lamont Assistant Research Professor

The goal is to create an automatic classification scheme for the morphology and surface roughness of volcanic deposits (mostly lava flows) from drone and satellite data. The morphology of lava flows tells us a lot about how the eruption evolved and what were the pre-existing conditions. This information helps estimate and mitigate hazards from future eruptions and to interpret past eruptions that were not observed (e.g., on other planets). Once we develop the scheme, it will be applied to data collected on earth and on other planets. The results will feed into hazards models as well as climate models and mission planning. The collaborating data-science PI should have experience with image processing, clustering, and automatic classification methods. Webpage
Contact

Gina S. Lovasi - Assistant Professor of Epidemiology

I am working to understand how local policies, infrastructure, and other influences on local environments affect health. In particular, I want to know whether the actions taken to protect population health have the anticipated benefits, which population groups benefit most, and whether these same actions have unanticipated detrimental effects on some domains of health.    To move research on population health drivers toward opportunistic longitudinal evaluations, we need a better way to scan for what is being done to promote health across multiple different municipalities, health catchment areas, or other settings. Starting with NYC, our research team has been interested in cataloging the "health in all policies" approach, with respect to specifically targeted cardiovascular disease risk factors (tobacco, pollution, diet, exercise). We've screened all legislation titles and text for relevance to these topics using labor intensive manual review. Likewise, to assess the evidence-base for these policies and other local initiatives, we have conducted systematic searches of the peer-reviewed literature. We are now exploring more scalable approaches (including web scraping, natural language processing, unsupervised learning). For this internal RFP, we would hope to (1) further develop and validate our efforts to document the evolution of legislation and scientific evidence relevant to a multi-sectoral approach to public health, (2) expand our review of municipal policies to other cities (we have identified several large US cities with text data somewhat comparable to NYC), and (3) explore whether social and print media coverage, grey literature, or other sources can help us to be more comprehensive in capturing the non-legislative actions that affect local health outcomes and the evidence for their effectiveness. We would welcome input from collaborating individuals with expertise in natural language processing and machine learning. In addition, skills in data visualization and an understanding of regression analysis would be helpful. Finally, a collaborating PI should have an interest in contributing to peer-reviewed publications and external funding applications.
Webpage
Contact

Dorothy M. Peteet - Adjunct Professor for Earth & Environmental Sciences

My area of discipline is paleoecology, paleoclimate, palynology and macrofossil studies. We seek to provide a usable online database with high resolution photos/data from our 4000-species extensive modern reference seed collection. Possible DNA database to identify heritage seeds. We need to make this unique collection available to other researchers all over the world. Right now we have not even digitized the typed cards linked to the seeds in bottles.
Webpage
Contact

Dagmar Riedel - Associate Research Scholar

I am looking for a PI with an interest in Big Data in the Digital Humanities, especially with regard to developing search functions for an academic online encyclopedia and linked data. Additional interests could be large databases, strategies of long-term digital preservation, structured vocabularies, and crowdsourcing. The project would concern the Encyclopaedia Iranica (EIr), as I am one of its associate editors. The EIr was established in the 1970s at Columbia as a collaborative Iranian Studies project. From 1978 until 2015 the project was supported by NEH grants. Its Open-Access (OA) online edition (www.iranicaonline.org) has currently more than 9,000 entries on its website. About 12 percent of the entries are born-digital. But the project does not yet have a comprehensive research data management plan because of its emergence as a website, merely providing information about a printed reference work. In the online database, the entries are saved as text, but not as "data": they have no mark-up and minimal metadata. In addition, the internal taxonomy is not complimented by a folksonomy of user generated tags. Nor is the online edition using linked data, even though the last decade has seen a burgeoning of digital surrogates, available OA. All of these matters need be addressed in connection with the next update of the current Content Management System (CMS), which was developed in 2009/2010, in order to provide a more adequate search function for the website and to ensure long-term usability of its contents. Since the project is supported by an international research community, the new CMS would ideally also allow for metadata which document the history of each entry, from invitation to editing and later corrections/updates of the published entry. The larger context of these concrete challenges is the question of how to conceptualize proprietary research data in the Humanities, which traditionally focus on facts as the foundation of interpretation but do not conceive of their research as data. Contact

Juan Francisco Saldarriaga - Research Scholar and Adjunct Assistant Professor of Urban Planning and Architecture

I am interested in projects that deal with the urban environment and that involve mapping and data visualization. Possible areas of study could be transportation networks or social media and their relationship with the urban environment. My area of discipline is Urban geographic information systems and data visualization. My collaborating PI should have expertise in computer science or engineering, or environmental science. I have done multiple projects involving interactive data visualization and mapping around urban topics.
Webpage
Contact

Maura Boldrini - Associate Professor of Clinical Neurobiology in Psychiatry

Area of Discipline: Brain Plasticity
I am studying stem cells in human brain and working on understanding how they grow and differentiate into new neurons in the context of aging, neuropsychiatric diseases, stress exposure and pharmacologic treatment. I am trying to understand how these new cells grow and survive in relationship to biological and environmental stimuli. I am looking to collaborate with a PI with expertise in statistical tools for complex databases, stocastic systems and chaos theory. Contact

Joachim Frank - Professor, Biochem & Mol Biophys

Area of Discipline: Cryo-Electron microscopy, structural biology, mechanism of protein biosynthesis
I'm collaborating with Dr. Abbas Ourmazd (U. of Wisconsin in Milwaukee), a physicist, on large-scale analysis of data (order of a million images or more) from electron microscopy of molecular machines, and we achieved a breakthrough (see Dashti et al., PNAS 2014) which I believe will revolutionize the study of molecular machines as it leads to a mapping of the energy landscape. Dr. Ourmazd, with relevant large data expertise and specific funding would allow us to develop a software platform for mapping the free-energy landscape of molecular machines images by cryo-EM at an accelerated pace, as benefit to the community.
Contact

John Hunt - Professor, Biological Sciences

Area of Discipline:Protein biochemistry & biophysics
We are interested in large-scale mapping of functions and interactions in the known protein universe, which currently includes ~90,000,000 and is likely to double every year for the foreseeable future. Proteins perform the vast majority of chemical and structural functions in living organisms, making a comprehensive mapping of the protein universe important but challenging due to the large amount of data. We have performed a partial functional mapping of ~4,000,000 bacterial sequences. We aspire to establish the definitive protein-information resource in the world, which would require keeping up continuously with the accelerating onslaught of sequence information as well as extending our methods for inferring functional relationships. We are looking for a collaborating PI with expertise in (i) Development of computing systems to continuously track and update vast amounts of sequence data. (ii) Sequence covariance analysis (e.g., using the Marks method -- http://dx.doi.org/10.7554/eLife.03430).
Contact


550 W. 120th St., Northwest Corner 1401, New York, NY 10027    212-854-5660
©2017 Columbia University