A land use regression model using machine learning and locally developed low cost particulate matter sensors in Uganda

The application of land use regression (LUR) modeling for estimating air pollution exposure has been used only rarely in sub-Saharan Africa (SSA). This is generally due to a lack of air quality monitoring networks in the region. Low cost air quality sensors developed locally in sub-Saharan Africa pr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Environmental research 2021-08, Vol.199, p.111352-111352, Article 111352
Hauptverfasser:	Coker, Eric S., Amegah, A. Kofi, Mwebaze, Ernest, Ssematimba, Joel, Bainomugisha, Engineer
Format:	Artikel
Sprache:	eng
Schlagworte:	Land use regression Low-cost sensors Machine learning Particulate matter
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The application of land use regression (LUR) modeling for estimating air pollution exposure has been used only rarely in sub-Saharan Africa (SSA). This is generally due to a lack of air quality monitoring networks in the region. Low cost air quality sensors developed locally in sub-Saharan Africa presents a sustainable operating mechanism that may help generate the air monitoring data needed for exposure estimation of air pollution with LUR models. The primary objective of our study is to investigate whether a network of locally developed low-cost air quality sensors can be used in LUR modeling for accurately predicting monthly ambient fine particulate matter (PM2.5) air pollution in urban areas of central and eastern Uganda. Secondarily, we aimed to explore whether the application of machine learning (ML) can improve LUR predictions compared to ordinary least squares (OLS) regression. We used data for the entire year of 2020 from a network of 23 PM2.5 low-cost sensors located in urban municipalities of eastern and central Uganda. Between January 1, 2020 and December 31, 2020, these sensors collected highly time-resolved measurement data of PM2.5 air concentrations. We used monthly-averaged PM2.5 concentration data for LUR prediction modeling of monthly PM2.5 concentrations. We used eight different ML base-learner algorithms as well as ensemble modeling. We applied 5-fold cross validation (80% training/20% test random splits) to evaluate the models with resampling and Root mean squared error (RMSE). The relative explanatory power and accuracy of the ML algorithms were evaluated by comparing coefficient of determination (R2) and RMSE, using OLS as the reference approach. The overall average PM2.5 concentration during the study period was 52.22 μg/m3 (IQR: 38.11, 62.84 μg/m3)—well above World Health Organization PM2.5 ambient air guidelines. From the base-learner and ensemble models, RMSE and R2 values ranged between 7.65 μg/m3 – 16.85 μg/m3 and 0.24–0.84, respectively. Extreme gradient boosting (xgbTree) performed best out of the base learner algorithms (R2 = 0.84; RMSE = 7.65 μg/m3). Model performance from ensemble modeling with Lasso and Elastic-Net Regularized Generalized Linear Models (glmnet) did not outperform xgbTree, but prediction performance was comparable to that of xgbTree. The most important temporal and spatial predictors of monthly PM2.5 levels were monthly precipitation, percent of the population using solid fuels for cooking, distance to La
ISSN:	0013-9351 1096-0953
DOI:	10.1016/j.envres.2021.111352