About the Focus Area
Data science has a key role to play in climate change.
The science, policy, and communication practices around data science, machine learning, and artificial intelligence have important implications for the climate crisis and the solutions society will utilize in the future.
From machine learning to data visualization, data science techniques are used to study the effects of climate change on marine biology, land use and restoration, food systems, patterns of change in vector borne diseases, and other climate-related issues. Data science is a powerful tool to help researchers understand the uncertainties and ambiguities inherent in data, to identify interventions, strategies, and solutions that realize co-benefits for humanity and the environment, and to evaluate the multiple–and sometimes conflicting–goals of decision-makers.
DSI researchers use the methods and tools of the growing field of data science and apply them to issues relevant to climate change and the environment.
Our researchers combine techniques from data science and environmental science to understand patterns in the global food system and develop strategies that make food-supply chains more nutritious and sustainable. They look at how machine learning can reduce the uncertainty of climate models, use deep learning for climate model superresolution, and help visualize carbon emissions based on raw data. Some combine machine learning with simulations of atmospheric turbulence to develop new models that can track air pollutants and reconstruct 3D scalar fields from 2D satellite images. Others provide innovative training in environmental health sciences, including climate and health. Researchers developed a data collaborative to harness geospatial data to help characterize populations displaced by disaster. This kind of information may help with planning for and responding to large scale natural disasters associated with climate change.
DSI students complete capstone projects to apply data science techniques to real world problems. One recent project used climate data to predict heavy snowfall. Using a data set of regional climate simulations, the student team calculated the frequency of large snow storms and tracked how the storm statistics will change in the future. Another project used machine-learning methods to develop mapped estimates of surface ocean CO2 concentrations from the limited ocean data available to monitor carbon sink to predict climate change. We help our students interpret environmental data by teaching how data are managed, stored, and disseminated, as well as how they are enrolled into various narratives and models of climate change.
- Ryan Abernathey, Earth and Environmental Sciences
- Peter deMenocal, Earth and Environmental Sciences and Lamont-Doherty Earth Observatory
- Mu Xu, Lamont-Doherty Earth Observatory
Eddy transport of tracers (e.g. heat, salt, dissolved chemicals, etc.) by mesoscale turbulence is important in climate models. However, the scales of eddy transport are about 10~200 km, which are not resolved by coarse-resolution global climate models. Consequently, the mesoscale tracer transports must be parameterized using a subgrid scheme. The goal of parameterization is to predict the tendencies of physical variables including velocity, temperature, salinity etc. due to the unresolved turbulent motions. There are many different types of mesoscale subgrid schemes, and many different tuning parameters. However, evaluating the performance of subgrid scheme quantitatively is difficult. In this work, we present a framework to evaluate the accuracy of subgrid schemes quantitatively with a data-driven method. We run a high-resolution simulation with resolution of about 5 km and consider this as our “truth.” With a coarse-grain method, the high-resolution data is projected to a low-resolution grid. The quantitative aim of eddy parameterization is to mitigate the loss of tracer transport due to the coarse-graining. Based on this consideration, we develop an offline system to calculate the eddy parameterization predictions and evaluate the performance of different subgrid schemes. This work lays a foundation for future data-driven statistical-learning-based methods for ocean eddy tracer transport parameterization.
- Joaquim Goes, Lamont-Doherty Earth Observatory
- Ankit Peshin, Ziyao Zhang and Paridhi Singh, Data Science Institute
This research team travels by ship to different parts of the Atlantic Ocean to collect water samples to study the effects of climate change on marine biology. They are designing an automated system through which seawater may be drawn into their moving ship and continuously analyzed. This automated system is an advancement over the usual method of collecting samples; ocean researchers typically stop their ships at pre-planned locations to collect samples. Data is also being gathered on the diversity of microscopic plant life, particularly plankton, which are critical to the marine ecosystem and to assess the ocean’s ability to sequester carbon dioxide from the atmosphere. Plankton form the basis of many food chains and are an important indicator of an ocean’s health. When fully functional, the system will provide data required to validate satellite images of the ocean now being developed by NASA, NOAA and other agencies.
- Marco Giometto, Civil Engineering and Engineering Mechanics
- Pierre Gentine and Mostafa Momen, Earth and Environmental Engineering
- Carl Vondrick, Computer Science
Satellite images are routinely used to track pollutant dispersion in the atmosphere, but the inherently two-dimensional information is limited and often impedes the development of effective rapid response plans. This project will develop a machine learning model to predict the three-dimensional structure of pollutant concentrations from satellite images of the dispersion process. Machine learning will be combined with high-fidelity simulations of atmospheric turbulence to guide the development of a model to track scalar dispersion as well as a model to reconstruct the corresponding three-dimensional concentration field from two-dimensional satellite-like information.
- Zsuzsa Marka and Szabolcs Marka, Physics and Columbia Astrophysics Laboratory
- John Wright, Electrical Engineering
- Zelda Moran, Earth Institute
Control of tsetse flies — the vector responsible for African Trypanosomiasis or sleeping sickness — is highly dependent on precise, high-volume, and cost-effective separation of tsetse genders. Enabling broad deployment, this team is pioneering machine learning-based robotic systems that use infrared imaging to peek inside tsetse pupae for early, robust, and fast identification of males to be used to suppress the wild tsetse population.
- Ruth DeFries, Ecology, Evolution and Environmental Biology
- Walter Baethgen, International Research Institute for Climate and Society
- Michael Puma, Center for Climate Systems Research, NASA Goddard Institute for Space Studies
- Kyle Davis, Data Science Institute
This project combines field work with data-driven techniques to study how to improve patterns of food trade in Latin America, sub-Saharan Africa, and South and Southeast Asia. Food trade patterns are essentially the import and export links that connect the production of food in one country to the consumption of it in another. If, for instance, an exporting country experiences a shortage in food production, it may be unable to provide the usual amount of exports to its trade partners. This project assesses how vulnerable these exporting countries are to production shortfalls, and how importing countries may buffer themselves against these possible shortfalls so they are not adversely affected. Research in food-system sustainability aims to minimize the effects of food production on the environment, working with local residents and experts to adapt agricultural systems to protect the environment and mitigate climate change.
- Christoph Meinrenken, Earth Institute
Carbon Catalogue, a free online interactive tool, visualizes the carbon emissions in hundreds of consumer and commercial products around the world. Co-developed with CoClear, a Columbia-alumna founded sustainability analytics firm, Carbon Catalogue, is based on raw data from CDP (formerly the Carbon Disclosure Project). Its visualization features were further honed during a DSI hackathon.
- Robert Chen, Center for International Earth and Science Information
Network Groups tasked with planning for and responding to disasters and humanitarian crises contend with data that is often fragmented, delayed, and limited in reliability. This project focuses on setting up the Data Collaborative and Modalities of Communication involving key users from the Platform on Disaster Displacement, (PDD) Columbia University’s humanitarian research community, selected commercial providers, and other relevant data science organizations and experts. It also identifies the data/information needs of the humanitarian/displacement tracking community and develops pilot tests of selected data streams (Internet location, night-time lights, etc.).
- Gavin Schmidt, NASA Goddard Institute for Space Studies and Earth Institute
Every winter the news media covers stories of massive disruption caused by large snowfalls for which cities and counties in the Eastern U.S. are apparently ill-prepared. The impact of large snow events is worse in regions that rarely get them and the observational statistics of their likelihood are limited because of their rarity. Nonetheless, there are anticipated changes in these statistics because of two possible counterbalancing factors in climate change — overall warming which might reduce snow events at the southern edge of the region and greater intensity of precipitation and higher atmospheric water vapor content, which might increase heavy snowfalls. Using a 50 member regional climate model ensemble, we explore the statistics of highly impactful snowfalls and address where and when we might be able to detect and expect significant changes over time.
- Galen McKinley, Earth and Environmental Science and Lamont Doherty Earth Observatory
Climate is changing due to human emissions of carbon to the atmosphere. But not all the carbon emitted remains in the atmosphere. In fact, over the course of the industrial era, the ocean has absorbed the equivalent of 41 percent of all human fossil-fuel derived carbon dioxide emissions, a phenomenon known as “sink.” Studying the ocean carbon cycle is critical to understanding and predicting climate change. It is also essential for efforts to limit climate change by reducing the growth rate of atmospheric CO2 concentrations. Ocean data are quite sparse, and CO2 in water cannot be directly measured from space. This team uses machine-learning methods to develop mapped estimates of surface ocean CO2 concentrations from the limited data available. Interpreting Urban Environmental Data This pilot course is part of a broader rethinking of archaeology at Columbia. Faculty work towards developing a more integrated archaeology program to bring together expertise and resources from different departments and develop a course which may be offered at the M.A. level and better serve our undergraduate population. The course also provides training in historical terrestrial palaeoecology and cultivates an informed historical consciousness and an understanding of the wider repercussions beyond the field of scientific research and reporting.