Generating high spatial resolution exposure estimates from sparse regulatory monitoring data

Random Forest algorithms have extensively been used to estimate ambient air pollutant concentrations. However, the accuracy of model-predicted estimates can suffer from extrapolation problems associated with limited measurement data to train the machine learning algorithms. In this study, we develop...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Atmospheric environment (1994) 2023-11, Vol.313, p.120076, Article 120076
Hauptverfasser: Ge, Yihui, Yang, Zhenchun, Lin, Yan, Hopke, Philip K., Presto, Albert A., Wang, Meng, Rich, David Q., Zhang, Junfeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Random Forest algorithms have extensively been used to estimate ambient air pollutant concentrations. However, the accuracy of model-predicted estimates can suffer from extrapolation problems associated with limited measurement data to train the machine learning algorithms. In this study, we developed and evaluated two approaches, incorporating low-cost sensor data, that enhanced the extrapolating ability of random-forest models in areas with sparse monitoring data. Rochester, NY is the area of a pregnancy-cohort study. Daily PM2.5 concentrations from the NAMS/SLAMS sites were obtained and used as the response variable in the model, with satellite data, meteorological, and land-use variables included as predictors. To improve the base random-forest models, we used PM2.5 measurements from a pre-existing low-cost sensors network, and then conducted a two-step backward selection to gradually eliminate variables with potential emission heterogeneity from the base models. We then introduced the regression-enhanced random forest method into the model development. Finally, contemporaneous urinary 1-hydroxypyrene was used to evaluate the PM2.5 predictions generated from the two approaches. The two-step approach increased the average external validation R2 from 0.49 to 0.65, and decreased the RMSE from 3.56 μg/m3 to 2.96 μg/m3. For the regression-enhanced random forest models, the average R2 of the external validation was 0.54, and the RMSE was 3.40 μg/m3. We also observed significant and comparable relationships between urinary 1-hydroxypyrene levels and PM2.5 predictions from both improved models. This PM2.5 model estimation strategy could improve the extrapolating ability of random forest models in areas with sparse monitoring data. •Developed two approaches to enhance the extrapolating ability of random forest models for estimating ambient PM2.5.•Introduce LASSO-regression enhanced random forest method into model development.•Evaluate the model predictions with urinary 1-OHP levels.•This study provided feasible strategies for air pollution exposure assessment in areas with sparse monitoring data.
ISSN:1352-2310
1873-2844
DOI:10.1016/j.atmosenv.2023.120076