As the smoke from the Canadian wildfires cleared out of the New York City area, researchers at the Data Science Institute were asked to consider how data science can support climate models, predictions, and risk prevention. Daniel M. Westervelt, a DSI faculty affiliate and environmental engineer, has spent his career investigating the effects of climate change in a global context. 

Westervelt is currently a Lamont Associate Research Professor at Columbia University’s Lamont Doherty Earth Observatory in Palisades, NY; an affiliated scientist with the NASA Goddard Institute of Space Studies in New York, NY; and an air pollution advisor to the US State Department. 

In 2022, Westervelt received a DSI Seed Funds grant to develop joint work with Xiaofan (Fred) Jiang, Associate Professor of Electrical Engineering at Columbia Engineering. Focusing specifically on air pollution data, the work applied a novel bias correction algorithm to a global network of consumer grade air quality sensors, advancing access to high-quality data for communities.

As an expert in the space, he spent the past week weighing in on the wildfires for PBS, CNN, The Weather Channel, The New York Times, CBC, Bloomberg News, and others. DSI talked to Westervelt about his research, techniques, and what we can collectively learn from our recent experience with hazardous air quality.


What motivated you to specialize in climate science and air pollution?

As a 2nd year undergraduate student in the early 2000s, I took a great class on environmental sustainability from a science and engineering perspective. I loved the class and the topic, and really just wanted to dedicate my career towards working on our grand environmental challenges. 

Which statistical or analytical techniques or practices do you use when working with air pollution data?

So many. Some of them are rather boilerplate, like linear regression, significance testing — but also a lot of cutting edge techniques such as AutoML, decision trees, and neural networks. Our group developed a somewhat obscure regression technique that leverages Gaussian Mixture Models and showed its utility for evaluating and calibrating air quality data.. We also use machine learning to convert satellite retrievals of aerosols to surface air quality data. 

With the ongoing Canadian wildfires, New York City felt the effects of the climate crisis first hand as air pollution reached a record high. How can data science help tackle current and future climate challenges?

A lot of ways. Data science is really baked into climate and air pollution science these days. One thing I’ve been particularly excited about is the potential for AI/ML models to emulate climate models. Climate models are the backbone of climate prediction and thus climate risk, but they are behemoths in terms of code and computational expense. The ability of AI/ML techniques to emulate the climate model results would be huge for getting reliable climate projections without the computational time and effort needed for traditional climate modeling. 

Can you share more about the air quality numbers that were circulated last week? What additional details should be shared with the public as related to our health? 

Scientists look at the concentration of fine particulate matter (PM2.5) and a number of different pollutant gasses. The general public tends to use the Air Quality Index (AQI), which was all over the news this week. AQI is a good metric for communicating the data to the public. The AQIs were record setting in NYC this week, at one point reaching 10x what are considered normal, healthy levels. These particles can penetrate deeply into our lungs and bloodstream and cause sickness and, with long-term exposure, premature death. 

Should additional air quality monitoring equipment be deployed in our areas?

More air quality monitoring equipment would definitely be welcome in NYC. By most standards, NYC is an extremely well-monitored city. However, there are still some areas (typically the less wealthy and less white neighborhoods) that are under monitored. 

As a Research Professor at Columbia University and an air pollution advisor to the US State Department, what partnerships can academia bridge to make sure the data is accessible for citizens? What are some ways that communities can better understand their air pollution exposure?

Governments and states can make their data open and accessible to the public. We already do this in the United States, and the US State Department runs air quality monitoring internationally at their embassies and consulates and makes the data publicly available. But not all air quality data is open access like this. Making data open access in places where it isn’t would empower communities to take action on their air quality. 

You were awarded a DSI Seed Funds grant last year to support air pollution data collection with low-cost, air quality sensors. Can you share more about this project? What’s next for your research?

We are working on, using data science, developing calibration factors for low cost air quality sensors such that the data can be trusted. Currently, these sensors are not very reliable for health and regulatory purposes, but with data-driven calibration techniques that can change. 

We’ve prototyped a statistical modeling method that can flexibly take into account most or all of the various factors that can impact accuracy of air sensors, and can be applied universally to a diverse set of locations around the globe.