A hybrid model for estimating the number concentration of ultrafine particles based on machine learning algorithms in central Taiwan

[Display omitted] •In-situ long-term measurements of ultrafine particles (UFP) were conducted in central Taiwan.•XGBoost model outperformed random forest and deep neural network.•The training and cross-validation R2 (nRMSE) were 0.99 (6.5%) and 0.78 (31.0%), respectively.•Surface pressure and traffi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Environment international 2023-05, Vol.175, p.107937-107937, Article 107937
Hauptverfasser: Jung, Chau-Ren, Chen, Wei-Ting, Young, Li-Hao, Hsiao, Ta-Chih
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •In-situ long-term measurements of ultrafine particles (UFP) were conducted in central Taiwan.•XGBoost model outperformed random forest and deep neural network.•The training and cross-validation R2 (nRMSE) were 0.99 (6.5%) and 0.78 (31.0%), respectively.•Surface pressure and traffic-related variables were important predictors.•MAIAC AOD and AE were not strong predictors for UFP. Modeling is a cost-effective measure to estimate ultrafine particle (UFP) levels. Previous UFP estimates generally relied on land-use regression with insufficient temporal resolution. We carried out in-situ measurements for UFP in central Taiwan and developed a model incorporating satellite-based measurements, meteorological variables, and land-use data to estimate daily UFP levels at a 1-km resolution. Two sampling campaigns were conducted for measuring hourly UFP concentrations at six sites between 2008–2010 and 2017–2021, respectively, using scanning mobility particle sizers. Three machine learning algorithms, namely random forest, eXtreme gradient boosting (XGBoost), and deep neural network, were used to develop UFP estimation models. The performances were evaluated with a 10-fold cross-validation, temporal, and spatial validation. A total of 1,022 effective sampling days were conducted. The XGBoost model had the best performance with a training coefficient of determination (R2) of 0.99 [normalized root mean square error (nRMSE): 6.52%] and a cross-validation R2 of 0.78 (nRMSE: 31.0%). The ten most important variables were surface pressure, distance to the nearest road, temperature, calendar year, day of the year, NO2, meridional wind, the total length of roads, PM2.5, and zonal wind. The UFP levels were elevated along the main roads across different seasons, suggesting that traffic emission is an important contributor to UFP. This hybrid model outperformed prior land use regression models and thus can provide more accurate estimates of UFP for epidemiological studies.
ISSN:0160-4120
1873-6750
DOI:10.1016/j.envint.2023.107937