Predicting PM 2.5 Concentrations Across USA Using Machine Learning

Economic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM 2.5 ). Although previous studies have tried to observe PM 2.5 both spatially and temporally using aerosol remot...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Earth and space science (Hoboken, N.J.) N.J.), 2023-10, Vol.10 (10)
Hauptverfasser: Vignesh, P. Preetham, Jiang, Jonathan H., Kishore, P.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Economic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM 2.5 ). Although previous studies have tried to observe PM 2.5 both spatially and temporally using aerosol remote sensing and geostatistical estimation, they were limited in accuracy by coarse resolution. In this paper, the performance of machine learning models on predicting PM 2.5 is assessed with linear regression (LR), decision tree (DT), gradient boosting regression (GBR), AdaBoost regression (ABR), XGBoost (XGB), k‐nearest neighbors (K‐NN), long short‐term memory (LSTM), random forest (RF), and support vector machine (SVM) using PM 2.5 station data from 2017 to 2021. To compare the accuracy of all the nine machine learning models, the coefficient of determination ( R 2 ), root mean square error (RMSE), Nash‐Sutcliffe efficiency (NSE), root mean square error ratio (RSR), and percent bias (PBIAS) were evaluated. Among all nine models, the RF (100 decision trees with a max depth of 20) and support vector regression (SVR; nonlinear kernel, degree 3 polynomial) models were the best for predicting PM 2.5 concentrations. Additionally, comparison of the PM 2.5 performance metrics displayed that the models had better predictive behavior in the western United States than that in the eastern United States. We present the prediction of PM 2.5 concentrations over the United States using various machine learning (ML) algorithms We show ML as a new approach for analyzing large data sets due to the computational speed and easy implementation for massive amounts of data The study is important for improving our understanding of the differences among ML algorithms for Earth Science research
ISSN:2333-5084
2333-5084
DOI:10.1029/2023EA002911