Ensemble-based deep learning for estimating PM 2.5 over California with multisource big data including wildfire smoke

Estimating PM concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteoro...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Environment international 2020-12, Vol.145, p.106143
Hauptverfasser: Li, Lianfa, Girguis, Mariam, Lurmann, Frederick, Pavlovic, Nathan, McClure, Crystal, Franklin, Meredith, Wu, Jun, Oman, Luke D, Breton, Carrie, Gilliland, Frank, Habre, Rima
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 106143
container_title Environment international
container_volume 145
creator Li, Lianfa
Girguis, Mariam
Lurmann, Frederick
Pavlovic, Nathan
McClure, Crystal
Franklin, Meredith
Wu, Jun
Oman, Luke D
Breton, Carrie
Gilliland, Frank
Habre, Rima
description Estimating PM concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteorology, topography (e.g. desert surfaces, mountains, snow cover) and land use. Using ensemble-based deep learning with big data fused from multiple sources we developed a PM prediction model with uncertainty estimates at a high spatial (1 km × 1 km) and temporal (weekly) resolution for a 10-year time span (2008-2017). We leveraged autoencoder-based full residual deep networks to model complex nonlinear interrelationships among PM emission, transport and dispersion factors and other influential features. These included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion, meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). As one of the primary predictors of interest with substantial missing data in California related to bright surfaces, cloud cover and other known interferences, missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution. Wildfire smoke contribution to PM was also calculated through HYSPLIT dispersion modeling of smoke emissions derived from MODIS fire radiative power using the Fire Energetics and Emissions Research version 1.0 model. Ensemble deep learning to predict PM achieved an overall mean training RMSE of 1.54 μg/m (R : 0.94) and test RMSE of 2.29 μg/m (R : 0.87). The top predictors included M2GMI carbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (>0.8) and a low RMSE (
doi_str_mv 10.1016/j.envint.2020.106143
format Article
fullrecord <record><control><sourceid>pubmed</sourceid><recordid>TN_cdi_pubmed_primary_32980736</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>32980736</sourcerecordid><originalsourceid>FETCH-pubmed_primary_329807363</originalsourceid><addsrcrecordid>eNqFTslqAkEUbIQQzfIHIu8HZtKLs3gWQy4BD7lLj_3GPNPdM_Si5O_jQHLOqaiNKsaWgpeCi_rlXKK_kE-l5HKSarFWM7YQbaOKuqn4nD3EeOacy3Vb3bO5kpuWN6pesLzzEV1nseh0RAMGcQSLOnjyJ-iHABgTOZ0mun8HWVYwXDDAVlu62Z40XCl9gss2URxyOCJ0dAKjkwbyR5vNVL2SNT0FhOiGL3xid722EZ9_8ZGtXncf27dizJ1DcxjDbTJ8H_5-qn8DPzAOUNk</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Ensemble-based deep learning for estimating PM 2.5 over California with multisource big data including wildfire smoke</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>ScienceDirect Journals (5 years ago - present)</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Li, Lianfa ; Girguis, Mariam ; Lurmann, Frederick ; Pavlovic, Nathan ; McClure, Crystal ; Franklin, Meredith ; Wu, Jun ; Oman, Luke D ; Breton, Carrie ; Gilliland, Frank ; Habre, Rima</creator><creatorcontrib>Li, Lianfa ; Girguis, Mariam ; Lurmann, Frederick ; Pavlovic, Nathan ; McClure, Crystal ; Franklin, Meredith ; Wu, Jun ; Oman, Luke D ; Breton, Carrie ; Gilliland, Frank ; Habre, Rima</creatorcontrib><description>Estimating PM concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteorology, topography (e.g. desert surfaces, mountains, snow cover) and land use. Using ensemble-based deep learning with big data fused from multiple sources we developed a PM prediction model with uncertainty estimates at a high spatial (1 km × 1 km) and temporal (weekly) resolution for a 10-year time span (2008-2017). We leveraged autoencoder-based full residual deep networks to model complex nonlinear interrelationships among PM emission, transport and dispersion factors and other influential features. These included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion, meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). As one of the primary predictors of interest with substantial missing data in California related to bright surfaces, cloud cover and other known interferences, missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution. Wildfire smoke contribution to PM was also calculated through HYSPLIT dispersion modeling of smoke emissions derived from MODIS fire radiative power using the Fire Energetics and Emissions Research version 1.0 model. Ensemble deep learning to predict PM achieved an overall mean training RMSE of 1.54 μg/m (R : 0.94) and test RMSE of 2.29 μg/m (R : 0.87). The top predictors included M2GMI carbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (&gt;0.8) and a low RMSE (&lt;3 μg/m ). Statewide predictions indicated that our model can capture the spatial distribution and temporal peaks in wildfire-related PM . The coefficient of variation indicated highest uncertainty over deciduous and mixed forests and open water land covers. Our method can be generalized to other regions, including those having a mix of major urban areas, deserts, intensive smoke events, snow cover and complex terrains, where PM has previously been challenging to predict. Prediction uncertainty estimates can also inform further model development and measurement error evaluations in exposure and health studies.</description><identifier>EISSN: 1873-6750</identifier><identifier>DOI: 10.1016/j.envint.2020.106143</identifier><identifier>PMID: 32980736</identifier><language>eng</language><publisher>Netherlands</publisher><subject>Air Pollutants - analysis ; Air Pollution - analysis ; Big Data ; California ; Deep Learning ; Environmental Monitoring ; Particulate Matter - analysis ; Smoke ; Wildfires</subject><ispartof>Environment international, 2020-12, Vol.145, p.106143</ispartof><rights>Copyright © 2020 The Author(s). Published by Elsevier Ltd.. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,27922,27923</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32980736$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Lianfa</creatorcontrib><creatorcontrib>Girguis, Mariam</creatorcontrib><creatorcontrib>Lurmann, Frederick</creatorcontrib><creatorcontrib>Pavlovic, Nathan</creatorcontrib><creatorcontrib>McClure, Crystal</creatorcontrib><creatorcontrib>Franklin, Meredith</creatorcontrib><creatorcontrib>Wu, Jun</creatorcontrib><creatorcontrib>Oman, Luke D</creatorcontrib><creatorcontrib>Breton, Carrie</creatorcontrib><creatorcontrib>Gilliland, Frank</creatorcontrib><creatorcontrib>Habre, Rima</creatorcontrib><title>Ensemble-based deep learning for estimating PM 2.5 over California with multisource big data including wildfire smoke</title><title>Environment international</title><addtitle>Environ Int</addtitle><description>Estimating PM concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteorology, topography (e.g. desert surfaces, mountains, snow cover) and land use. Using ensemble-based deep learning with big data fused from multiple sources we developed a PM prediction model with uncertainty estimates at a high spatial (1 km × 1 km) and temporal (weekly) resolution for a 10-year time span (2008-2017). We leveraged autoencoder-based full residual deep networks to model complex nonlinear interrelationships among PM emission, transport and dispersion factors and other influential features. These included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion, meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). As one of the primary predictors of interest with substantial missing data in California related to bright surfaces, cloud cover and other known interferences, missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution. Wildfire smoke contribution to PM was also calculated through HYSPLIT dispersion modeling of smoke emissions derived from MODIS fire radiative power using the Fire Energetics and Emissions Research version 1.0 model. Ensemble deep learning to predict PM achieved an overall mean training RMSE of 1.54 μg/m (R : 0.94) and test RMSE of 2.29 μg/m (R : 0.87). The top predictors included M2GMI carbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (&gt;0.8) and a low RMSE (&lt;3 μg/m ). Statewide predictions indicated that our model can capture the spatial distribution and temporal peaks in wildfire-related PM . The coefficient of variation indicated highest uncertainty over deciduous and mixed forests and open water land covers. Our method can be generalized to other regions, including those having a mix of major urban areas, deserts, intensive smoke events, snow cover and complex terrains, where PM has previously been challenging to predict. Prediction uncertainty estimates can also inform further model development and measurement error evaluations in exposure and health studies.</description><subject>Air Pollutants - analysis</subject><subject>Air Pollution - analysis</subject><subject>Big Data</subject><subject>California</subject><subject>Deep Learning</subject><subject>Environmental Monitoring</subject><subject>Particulate Matter - analysis</subject><subject>Smoke</subject><subject>Wildfires</subject><issn>1873-6750</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFTslqAkEUbIQQzfIHIu8HZtKLs3gWQy4BD7lLj_3GPNPdM_Si5O_jQHLOqaiNKsaWgpeCi_rlXKK_kE-l5HKSarFWM7YQbaOKuqn4nD3EeOacy3Vb3bO5kpuWN6pesLzzEV1nseh0RAMGcQSLOnjyJ-iHABgTOZ0mun8HWVYwXDDAVlu62Z40XCl9gss2URxyOCJ0dAKjkwbyR5vNVL2SNT0FhOiGL3xid722EZ9_8ZGtXncf27dizJ1DcxjDbTJ8H_5-qn8DPzAOUNk</recordid><startdate>202012</startdate><enddate>202012</enddate><creator>Li, Lianfa</creator><creator>Girguis, Mariam</creator><creator>Lurmann, Frederick</creator><creator>Pavlovic, Nathan</creator><creator>McClure, Crystal</creator><creator>Franklin, Meredith</creator><creator>Wu, Jun</creator><creator>Oman, Luke D</creator><creator>Breton, Carrie</creator><creator>Gilliland, Frank</creator><creator>Habre, Rima</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope></search><sort><creationdate>202012</creationdate><title>Ensemble-based deep learning for estimating PM 2.5 over California with multisource big data including wildfire smoke</title><author>Li, Lianfa ; Girguis, Mariam ; Lurmann, Frederick ; Pavlovic, Nathan ; McClure, Crystal ; Franklin, Meredith ; Wu, Jun ; Oman, Luke D ; Breton, Carrie ; Gilliland, Frank ; Habre, Rima</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-pubmed_primary_329807363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Air Pollutants - analysis</topic><topic>Air Pollution - analysis</topic><topic>Big Data</topic><topic>California</topic><topic>Deep Learning</topic><topic>Environmental Monitoring</topic><topic>Particulate Matter - analysis</topic><topic>Smoke</topic><topic>Wildfires</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Lianfa</creatorcontrib><creatorcontrib>Girguis, Mariam</creatorcontrib><creatorcontrib>Lurmann, Frederick</creatorcontrib><creatorcontrib>Pavlovic, Nathan</creatorcontrib><creatorcontrib>McClure, Crystal</creatorcontrib><creatorcontrib>Franklin, Meredith</creatorcontrib><creatorcontrib>Wu, Jun</creatorcontrib><creatorcontrib>Oman, Luke D</creatorcontrib><creatorcontrib>Breton, Carrie</creatorcontrib><creatorcontrib>Gilliland, Frank</creatorcontrib><creatorcontrib>Habre, Rima</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><jtitle>Environment international</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Lianfa</au><au>Girguis, Mariam</au><au>Lurmann, Frederick</au><au>Pavlovic, Nathan</au><au>McClure, Crystal</au><au>Franklin, Meredith</au><au>Wu, Jun</au><au>Oman, Luke D</au><au>Breton, Carrie</au><au>Gilliland, Frank</au><au>Habre, Rima</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Ensemble-based deep learning for estimating PM 2.5 over California with multisource big data including wildfire smoke</atitle><jtitle>Environment international</jtitle><addtitle>Environ Int</addtitle><date>2020-12</date><risdate>2020</risdate><volume>145</volume><spage>106143</spage><pages>106143-</pages><eissn>1873-6750</eissn><abstract>Estimating PM concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteorology, topography (e.g. desert surfaces, mountains, snow cover) and land use. Using ensemble-based deep learning with big data fused from multiple sources we developed a PM prediction model with uncertainty estimates at a high spatial (1 km × 1 km) and temporal (weekly) resolution for a 10-year time span (2008-2017). We leveraged autoencoder-based full residual deep networks to model complex nonlinear interrelationships among PM emission, transport and dispersion factors and other influential features. These included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion, meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). As one of the primary predictors of interest with substantial missing data in California related to bright surfaces, cloud cover and other known interferences, missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution. Wildfire smoke contribution to PM was also calculated through HYSPLIT dispersion modeling of smoke emissions derived from MODIS fire radiative power using the Fire Energetics and Emissions Research version 1.0 model. Ensemble deep learning to predict PM achieved an overall mean training RMSE of 1.54 μg/m (R : 0.94) and test RMSE of 2.29 μg/m (R : 0.87). The top predictors included M2GMI carbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (&gt;0.8) and a low RMSE (&lt;3 μg/m ). Statewide predictions indicated that our model can capture the spatial distribution and temporal peaks in wildfire-related PM . The coefficient of variation indicated highest uncertainty over deciduous and mixed forests and open water land covers. Our method can be generalized to other regions, including those having a mix of major urban areas, deserts, intensive smoke events, snow cover and complex terrains, where PM has previously been challenging to predict. Prediction uncertainty estimates can also inform further model development and measurement error evaluations in exposure and health studies.</abstract><cop>Netherlands</cop><pmid>32980736</pmid><doi>10.1016/j.envint.2020.106143</doi></addata></record>
fulltext fulltext
identifier EISSN: 1873-6750
ispartof Environment international, 2020-12, Vol.145, p.106143
issn 1873-6750
language eng
recordid cdi_pubmed_primary_32980736
source MEDLINE; DOAJ Directory of Open Access Journals; ScienceDirect Journals (5 years ago - present); EZB-FREE-00999 freely available EZB journals
subjects Air Pollutants - analysis
Air Pollution - analysis
Big Data
California
Deep Learning
Environmental Monitoring
Particulate Matter - analysis
Smoke
Wildfires
title Ensemble-based deep learning for estimating PM 2.5 over California with multisource big data including wildfire smoke
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T20%3A21%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pubmed&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Ensemble-based%20deep%20learning%20for%20estimating%20PM%202.5%20over%20California%20with%20multisource%20big%20data%20including%20wildfire%20smoke&rft.jtitle=Environment%20international&rft.au=Li,%20Lianfa&rft.date=2020-12&rft.volume=145&rft.spage=106143&rft.pages=106143-&rft.eissn=1873-6750&rft_id=info:doi/10.1016/j.envint.2020.106143&rft_dat=%3Cpubmed%3E32980736%3C/pubmed%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/32980736&rfr_iscdi=true