Ensemble-based deep learning for estimating PM2.5 over California with multisource big data including wildfire smoke

•California has high variability in PM2.5 sources, meteorology and topography.•We used ensemble deep learning with multisource big data to improve PM2.5 estimates.•We reliably imputed missing satellite AOD and fused wildfire dispersion estimates.•Our model achieved high PM2.5 prediction performance...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Environment international 2020-12, Vol.145, p.106143-106143, Article 106143
Hauptverfasser:	Li, Lianfa, Girguis, Mariam, Lurmann, Frederick, Pavlovic, Nathan, McClure, Crystal, Franklin, Meredith, Wu, Jun, Oman, Luke D., Breton, Carrie, Gilliland, Frank, Habre, Rima
Format:	Artikel
Sprache:	eng
Schlagworte:	Air pollution exposure California Computer Programming And Software Earth Resources And Remote Sensing Environment Pollution Environmental Sciences Environmental Sciences & Ecology Geosciences (General) High spatiotemporal resolution Life Sciences & Biomedicine Machine learning PM2.5 Remote sensing Science & Technology Wildfires
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•California has high variability in PM2.5 sources, meteorology and topography.•We used ensemble deep learning with multisource big data to improve PM2.5 estimates.•We reliably imputed missing satellite AOD and fused wildfire dispersion estimates.•Our model achieved high PM2.5 prediction performance with uncertainty estimates. Estimating PM2.5 concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteorology, topography (e.g. desert surfaces, mountains, snow cover) and land use. Using ensemble-based deep learning with big data fused from multiple sources we developed a PM2.5 prediction model with uncertainty estimates at a high spatial (1 km × 1 km) and temporal (weekly) resolution for a 10-year time span (2008–2017). We leveraged autoencoder-based full residual deep networks to model complex nonlinear interrelationships among PM2.5 emission, transport and dispersion factors and other influential features. These included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion, meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). As one of the primary predictors of interest with substantial missing data in California related to bright surfaces, cloud cover and other known interferences, missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution. Wildfire smoke contribution to PM2.5 was also calculated through HYSPLIT dispersion modeling of smoke emissions derived from MODIS fire radiative power using the Fire Energetics and Emissions Research version 1.0 model. Ensemble deep learning to predict PM2.5 achieved an overall mean training RMSE of 1.54 μg/m3 (R2: 0.94) and test RMSE of 2.29 μg/m3 (R2: 0.87). The top predictors included M2GMI carbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM2.5 sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (>0.8) and a low RMSE (
ISSN:	0160-4120 1873-6750
DOI:	10.1016/j.envint.2020.106143