Interface of Natural and Data Sciences


Collaboration Opportunities between the Natural Sciences and Data Sciences


Collaboration across disciplines between the natural sciences and data sciences is an integral aspect of the Interface grant opportunity funded in part by the Gordon and Betty Moore Foundation through Grant GBMF3941 to Columbia University and the Alfred P. Sloan Foundation.

In addition to the Foundations Center faculty, this page lists faculty from these interdisciplinary backgrounds who have expressed an interest in collaborating and submitting a proposal for the Institute's RFP. 

Click on each person's name for more information about their interest area as well as their contact information.

Hide/Show

Ryan Abernathy - Asst. Professor of Earth & Environmental Science - Natural Science Expertise

I am interested in developing parallelized workflows for processing very large oceanographic datasets. The datasets take two forms: gridded spatiotemporal data (e.g. temperature, fluid velocity, etc.) from satellite observations and numerical simulations, and particle trajectory (i.e. Lagrangian) from data and simulations. The various datasets range in size from 1 TB to 1 PB. I want to develop a flexible system for running queries and calculating derived quantities from these datasets using map-reduce programming paradigms, ideally in python. I am looking with a collaborator with experience in very large (i.e. "big data" scale) distributed file systems and in-place data analysis. Many of the challenges in my project are related to scalable architecture and system engineering, rather than machine learning. I think that hadoop could be a powerful tool for these problems, but I have no experience with it. The ideal collaborator would have a background in parallel programming and system architecture and hopefully also hadoop. I would like to build a flexible system, not just solve a specific problem.
Webpage
Contact

Peter Allen - Professor of Computer Science - Data Science Expertise

My expertise is in computer vision (both 2D and 3D imaging), feature detection and classification, large point cloud processing, and data visualization/animation. Projects needing these skills may be complementary.
Webpage
Contact

Robin Bell - PGI Research Professor of Lamont-Doherty Earth Observatory - Natural Science Expertise

Rising sea level from the loss of ice from Antarctica and Greenland will have global impacts in the coming decade. Natural scientists trying to constrain how fast the ice sheets will change tend to only focus on single data streams due to the size of the data sets and the difficulty in analyzing multiple data sets simultaneously. It is imperative to integrate multiple datasets to answer how the large ice sheet responds to a changing climate. We are collecting a wide range of large data sets over Greenland and Antarctica that can address the following questions: What happens to the meltwater produced at the ice surface in Greenland? Will the surface meltwater lubricating the base of the ice sheet cause ice to flow faster into the ocean? Which Antarctic ice shelf is most vulnerable to collapse in the next century? Beginning this summer in Greenland, we will conduct large scale airborne surveys with the NSF funded instrumentation suite, IcePod, a new airborne geophysical platform developed by our research group here at Lamont-Doherty. The IcePod instrumentation suite includes laser altimetry, optical and infrared imagery to image the ice surface, and a radar depth sounder to collect information about the interior and the base of the ice sheet. These instruments produce 4 TB of data each flight and each collect information at different spatial resolutions. The challenge is to develop effective tools to manipulate and analyze these massive new datasets that simultaneously image the surface, the internal structure and the bottom of the 400-3500 m thick ice and have the potential to provide new insights into the ice sheet stability. Our goal is to integrate, interrogate and visualize the full range of IcePod data sets in order to understand key ice sheet processes and to address major questions about the changing cryosphere. This project will call on a data scientist’s exploratory data analysis skills, as well as their experience with statistical interpretations of geospatial data.
Webpage
Contact

Benjamin Bostick - Associate Professor of Lamont-Doherty Earth Observatory - Natural Science Expertise

We are interested in the geochemical processes that occur on the earth's surface, and how they evolve over time. For this project, we would like to develop improved algorithms to use time resolved image stacks, collected by remote sensing at large scales, and X-ray microscopy at small scales, to understand the rates of geochemical processes, and to better understand the effects of humans on those processes. We would like a collaborator with experience in processing spatial data, and with an interest in spatial-temporal correlations. Fourier-transformations of data, and the kinds of decisions used to compress data, may also be useful in extracting additional spatial and temporal data from images usually lacking compression.Most importantly, we want a collaborator who can help develop our data in a new direction that we do not foresee ourselves.
Webpage
Contact

Albert Boulanger - Senior Staff Associate, CCLS - Data Science Expertise

While I was at Lamont, our group used pattern recognition applied to 4D (time-lapse) seismic data. I am also interested in low frequency EM whole-earth tomography, passive EM using lightning. I worked at NOAA deploying one of the 1st lightning detection networks. I’m exploring collaborations with data oriented individuals willing to explore outside the box methodology. The 4D method we developed could be used for other time lapse data sets (including medical) so looking for expert in the data -- weather, EM, seismic, GPR, gravity, hyper-spectral, Our group at CCLS (Roger Anderson's) is applying machine learning to predict building performance, so time series prediction is also a good direction. Weather affects building performance so have initiated research in urban microclimates with Lamont summer interns so that is another possible match.
Webpage
Contact

Joaquim Goes - Research Professor of Lamont Doherty Earth Observatory - Natural Science Expertise

We are interested in applying statistical and neural network techniques to oceanographic datasets from ships, satellites and ARGO floats to optimize their use in understanding how changes in earth's climate are impacting marine ecosystems. We're interested in working with those who have skills in statistics, computer science, and mathematics.
Webpage
Contact

Zoltan Haiman - Professor of Astronomy - Natural Science Expertise

We are interested in searching for massive binary black holes in galactic nuclei. Such binary black holes are expected to be ubiquitous, but have not been found. This is likely because they are too close to each other to be resolved spatially with a telescope. However the black hole pair can outshine the host galaxy and produce a point-like source, with periodic brightness variations. The period of these variations should follow the orbital period of the black hole, which is expected to be between days to months. We would like to carry out a search for periodic brightness variations among at least hundreds of thousands (and possible millions) of galaxies in an existing galaxy catalog, created by the "Palomar Transient Factory" survey of the sky. The challenge is that this catalog was intended for another purpose, and so has extremely heterogeneous time-coverage: some of the objects are observed only a handful of times over 5 years, while others have been observed thousands of times. We are looking for a data scientist PI that has expertise in searching for periodic signals in large catalogs of time-series data with highly heterogeneous time-coverage.
Webpage
Contact

Garud Iyengar - Professor of IEOR - Data Science Expertise

My expertise is in developing optimization algorithms that can efficiently discover low dimensional structure in very high dimensional data. Examples of such problems include LASSO (sparse predictors), or group LASSO (small set of correlated predictors), sparse PCA, foreground detection (sparse set of moving objects in a quasi-static background), etc. Many data science applications can be modeled by optimization problems of this nature. I would very much enjoy collaborating on problems where one is attempting to discover structure hidden using statistical or optimization tools.
Webpage
Contact

Becky Passonneau - Director; Center for Computational Learning Systems - Data Science Expertise (NLP)

We look at features or other aspects of modeling (e.g., labels) to enhance prediction of phenomena in the natural world based on information found in textual sources. Existing datasets that could be supplemented with information drawn from textual sources, including social media,news, chat rooms, web sites, etc., and a modeling problem that involves predication. The other PI could be a data scientist with an interest in and expertise in an area of the natural sciences, or could be from the natural sciences with less expertise in data analysis techniques. In the former case, the team focus could be on integration of a wider range of features for modeling, and modeling techniques that lead to more interpretable models. In the latter case, the team focus could be on application of straightforward data analysis techniques to develop scientifically interesting models.
Webpage
Contact

Abhay Pasupathy - Asst. Professor of Physics - Natural Science Expertise

We collect microscopy data on new materials at the nanoscale. These are in general a sequence of twodimensional images at different energies. The contrast in these images arises from a number of different reasons related to the atomic structure and electronic motion in a given image. We would like to statistically analyze these images in a number of ways - to look for spatial patterns, to identify spatial transforms that provide a better representation of the images, and so on. We then would like to connect these statistical descriptions of the data to models of atomic and electronic structure, perhaps using optimization techniques. Our collaborator should have experience in statistical analysis as applied to images - for example, in pattern recognition. Since a lot of the features seen in our images comes from the wave nature of electrons, some experience in wave propagation or similar areas is helpful.
Webpage
Contact

Michael Puma - Assoc. Research Scientist of CCSR - Natural Science Expertise

The world food crisis in 2008 highlighted the fragility of the global food trade network, but overall network susceptibility to extreme stressors remains unclear. My colleagues and I have shown that the global food trade network is increasingly vulnerable to worldwide collapse, which has especially severe implications for poor countries. We would like to continue our examination of the global food system -- using the tools of complex systems science -- to understand network fragility. We will examine how food reserves and changes in food-production self sufficiency impact the vulnerability of the global food trade network. Ultimately, we want to inform policy recommendations that balance the efficiency of international trade (and its associated specialization) with increased resilience of domestic production and global demand diversity. Our project would benefit greatly from collaboration with a "data science" faculty member, who has a statistical understanding of complex networks and the expertise to understand the vulnerability of such systems to disturbances. We will analyze detailed bilateral trade data as well as climate data (for extremes in precipitation and temperature) to better understand how to strengthen our global food system. Also, we would welcome collaboration with faculty, who have experience with analogous complex systems and have novel ideas on how to explore the fragility of the global food system.
Webpage
Contact

Ansaf Salleb-Aouissi - Associate Research Scientist of CCLS - Data Science, Machine Learning, and Data Mining Expertise

My background is in computer science and machine learning. The topic of my PhD thesis work was on mining patterns with application to GIS data. I've explored the spatial and non-spatial relationships between different georeferenced objects, and understanding subduction phenomenon. The data I worked on included geology, mineralogy, seismology, volcanic, geodynamics and other layers for the entire Andes Cordillera. This work was done in the context of a project with the French Geological Survey (BRGM) in France. Since then, I've worked on other projects in domains ranging from ecology (Postdoc at INRIA, Rennes), the power grid (for several years with Con Edison), and most recently on clinical informatics in collaboration with physicians from CUMC. A decade after my PhD, I still find mining geo-referenced data exciting from a geo-science perspective. It is also compelling to me because it challenges machine learning as data become more complex, interconnected and larger. With the advent of finer measuring instruments that collect rich data, added to large historical data, there are chances we can now understand and predict natural phenomena that was not possible before or at least make progress toward that. I would love to explore collaboration possibilities.
Webpage
Contact

Van-Anh Truong - Asst. Professor of IEOR - Data Science Expertise

My expertise is in mathematical modeling and optimization. I have been working with resource/capacity management problems that have application in supply-chain management and healthcare. I am interested in those who possess domain expertise in the Natural Science.
Webpage
Contact

Chunhua Weng - Asst. Professor of Biomedical Informatics- Data Science Expertise

I am interested in interdisciplinary collaborative projects that synthesize knowledge from fragmented literature to support translational science and literature-based discoveries. I am willing to work with someone who possesses content expertise that can be used to guide literature selection and domain modeling OR pattern recognition or network analysis skills, depending on your role as either content experts or data scientists.
Webpage
Contact


550 W. 120th St., Northwest Corner 1401, New York, NY 10027    212-854-5660
©2017 Columbia University