Ensemble learning models for the prediction of the weekly peak of PM 2.5 concentration in Algiers, Algeria

Introduction: This paper focuses on the prediction of weekly peak levels of Particulate Matter with an aerodynamic diameter of less than 2.5 µm (PM 2.5), using various Machine Learning (ML) models. The study compares ML models to deep learning models and emphasizes the explain ability of ML models f...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of air pollution and health 2023-10, Vol.8 (3)
Hauptverfasser: Ghazi, Sabri, Dib, Ahmed, Said, Mohamed, Tarek, Khadir, Dugdale, Julie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Introduction: This paper focuses on the prediction of weekly peak levels of Particulate Matter with an aerodynamic diameter of less than 2.5 µm (PM 2.5), using various Machine Learning (ML) models. The study compares ML models to deep learning models and emphasizes the explain ability of ML models for PM 2.5 prediction. Materials and methods: We examine different combinations of features and time window dimensions to evaluate the performance of ML models. It utilizes Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Decision Tree (DT), and five Ensemble Models (EL) including AdaBoost, XGBoost, LightGBM, CatBoost, and Random Forest (RF). The dataset includes three years of daily measurements of weather parameters and PM 2.5. Results: Lagged values of PM 2.5 improves prediction performance, particularly when the lagged value window size spans seven days or multiples thereof. This confirms that road traffic, which exhibits a weekly seasonality, is the primary source of PM 2.5 in Algiers. Interestingly, including lagged values of weather parameters decreases prediction performance, even when chosen based on their correlation with PM 2.5. The AdaBoost model performs the best, achieving a Root Mean Squared Error (RMSE) of 2.899 µg/m³ and an R 2 value of 0.96. Conclusion: EL models, specifically AdaBoost, exhibit strong performance in predicting PM 2.5 levels. They not only provide accurate predictions but also allow analysis of feature importance. Lagged values of PM 2.5 have a greater impact on predictions compared to weather parameters. Surprisingly, including weather parameters hampers prediction performance. Therefore, the utilization of ensemble learning models offers valuable insights into feature significance.
ISSN:2476-3071
2476-3071
DOI:10.18502/japh.v8i3.13783