Predicting PM 2.5 Concentrations Across USA Using Machine Learning

Economic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM 2.5 ). Although previous studies have tried to observe PM 2.5 both spatially and temporally using aerosol remot...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Earth and space science (Hoboken, N.J.) N.J.), 2023-10, Vol.10 (10)
Hauptverfasser: Vignesh, P. Preetham, Jiang, Jonathan H., Kishore, P.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 10
container_start_page
container_title Earth and space science (Hoboken, N.J.)
container_volume 10
creator Vignesh, P. Preetham
Jiang, Jonathan H.
Kishore, P.
description Economic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM 2.5 ). Although previous studies have tried to observe PM 2.5 both spatially and temporally using aerosol remote sensing and geostatistical estimation, they were limited in accuracy by coarse resolution. In this paper, the performance of machine learning models on predicting PM 2.5 is assessed with linear regression (LR), decision tree (DT), gradient boosting regression (GBR), AdaBoost regression (ABR), XGBoost (XGB), k‐nearest neighbors (K‐NN), long short‐term memory (LSTM), random forest (RF), and support vector machine (SVM) using PM 2.5 station data from 2017 to 2021. To compare the accuracy of all the nine machine learning models, the coefficient of determination ( R 2 ), root mean square error (RMSE), Nash‐Sutcliffe efficiency (NSE), root mean square error ratio (RSR), and percent bias (PBIAS) were evaluated. Among all nine models, the RF (100 decision trees with a max depth of 20) and support vector regression (SVR; nonlinear kernel, degree 3 polynomial) models were the best for predicting PM 2.5 concentrations. Additionally, comparison of the PM 2.5 performance metrics displayed that the models had better predictive behavior in the western United States than that in the eastern United States. We present the prediction of PM 2.5 concentrations over the United States using various machine learning (ML) algorithms We show ML as a new approach for analyzing large data sets due to the computational speed and easy implementation for massive amounts of data The study is important for improving our understanding of the differences among ML algorithms for Earth Science research
doi_str_mv 10.1029/2023EA002911
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1029_2023EA002911</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1029_2023EA002911</sourcerecordid><originalsourceid>FETCH-LOGICAL-c801-74051a2db9aeb6767ea40d1d2d4ea45df5d0c9139857631e7b96e51b81f4c93c3</originalsourceid><addsrcrecordid>eNpNkMtOwzAURC0EElXpjg_wB5Byr5_xMkTlIaWiEu06cmwHgsBBdjb8PSmw6GqORqPRaAi5RlgjMHPLgPFNBTMinpEF45wXEkpxfsKXZJXzOwAgkwqYWJC7XQp-cNMQX-luS9la0nqMLsQp2WkYY6aVS2PO9PBS0UM-xrbWvQ0x0CbYFGfjilz09iOH1b8uyf5-s68fi-b54amumsKVgIUWINEy3xkbOqWVDlaAR8-8mEn6XnpwBrkppVYcg-6MChK7EnvhDHd8SW7-an_3pNC3X2n4tOm7RWiPD7SnD_AfeGVLdw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Predicting PM 2.5 Concentrations Across USA Using Machine Learning</title><source>Wiley Online Library Open Access</source><source>DOAJ Directory of Open Access Journals</source><source>Wiley Online Library Journals Frontfile Complete</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Vignesh, P. Preetham ; Jiang, Jonathan H. ; Kishore, P.</creator><creatorcontrib>Vignesh, P. Preetham ; Jiang, Jonathan H. ; Kishore, P.</creatorcontrib><description>Economic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM 2.5 ). Although previous studies have tried to observe PM 2.5 both spatially and temporally using aerosol remote sensing and geostatistical estimation, they were limited in accuracy by coarse resolution. In this paper, the performance of machine learning models on predicting PM 2.5 is assessed with linear regression (LR), decision tree (DT), gradient boosting regression (GBR), AdaBoost regression (ABR), XGBoost (XGB), k‐nearest neighbors (K‐NN), long short‐term memory (LSTM), random forest (RF), and support vector machine (SVM) using PM 2.5 station data from 2017 to 2021. To compare the accuracy of all the nine machine learning models, the coefficient of determination ( R 2 ), root mean square error (RMSE), Nash‐Sutcliffe efficiency (NSE), root mean square error ratio (RSR), and percent bias (PBIAS) were evaluated. Among all nine models, the RF (100 decision trees with a max depth of 20) and support vector regression (SVR; nonlinear kernel, degree 3 polynomial) models were the best for predicting PM 2.5 concentrations. Additionally, comparison of the PM 2.5 performance metrics displayed that the models had better predictive behavior in the western United States than that in the eastern United States. We present the prediction of PM 2.5 concentrations over the United States using various machine learning (ML) algorithms We show ML as a new approach for analyzing large data sets due to the computational speed and easy implementation for massive amounts of data The study is important for improving our understanding of the differences among ML algorithms for Earth Science research</description><identifier>ISSN: 2333-5084</identifier><identifier>EISSN: 2333-5084</identifier><identifier>DOI: 10.1029/2023EA002911</identifier><language>eng</language><ispartof>Earth and space science (Hoboken, N.J.), 2023-10, Vol.10 (10)</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c801-74051a2db9aeb6767ea40d1d2d4ea45df5d0c9139857631e7b96e51b81f4c93c3</citedby><cites>FETCH-LOGICAL-c801-74051a2db9aeb6767ea40d1d2d4ea45df5d0c9139857631e7b96e51b81f4c93c3</cites><orcidid>0000-0002-5929-8951</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,860,27903,27904</link.rule.ids></links><search><creatorcontrib>Vignesh, P. Preetham</creatorcontrib><creatorcontrib>Jiang, Jonathan H.</creatorcontrib><creatorcontrib>Kishore, P.</creatorcontrib><title>Predicting PM 2.5 Concentrations Across USA Using Machine Learning</title><title>Earth and space science (Hoboken, N.J.)</title><description>Economic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM 2.5 ). Although previous studies have tried to observe PM 2.5 both spatially and temporally using aerosol remote sensing and geostatistical estimation, they were limited in accuracy by coarse resolution. In this paper, the performance of machine learning models on predicting PM 2.5 is assessed with linear regression (LR), decision tree (DT), gradient boosting regression (GBR), AdaBoost regression (ABR), XGBoost (XGB), k‐nearest neighbors (K‐NN), long short‐term memory (LSTM), random forest (RF), and support vector machine (SVM) using PM 2.5 station data from 2017 to 2021. To compare the accuracy of all the nine machine learning models, the coefficient of determination ( R 2 ), root mean square error (RMSE), Nash‐Sutcliffe efficiency (NSE), root mean square error ratio (RSR), and percent bias (PBIAS) were evaluated. Among all nine models, the RF (100 decision trees with a max depth of 20) and support vector regression (SVR; nonlinear kernel, degree 3 polynomial) models were the best for predicting PM 2.5 concentrations. Additionally, comparison of the PM 2.5 performance metrics displayed that the models had better predictive behavior in the western United States than that in the eastern United States. We present the prediction of PM 2.5 concentrations over the United States using various machine learning (ML) algorithms We show ML as a new approach for analyzing large data sets due to the computational speed and easy implementation for massive amounts of data The study is important for improving our understanding of the differences among ML algorithms for Earth Science research</description><issn>2333-5084</issn><issn>2333-5084</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNkMtOwzAURC0EElXpjg_wB5Byr5_xMkTlIaWiEu06cmwHgsBBdjb8PSmw6GqORqPRaAi5RlgjMHPLgPFNBTMinpEF45wXEkpxfsKXZJXzOwAgkwqYWJC7XQp-cNMQX-luS9la0nqMLsQp2WkYY6aVS2PO9PBS0UM-xrbWvQ0x0CbYFGfjilz09iOH1b8uyf5-s68fi-b54amumsKVgIUWINEy3xkbOqWVDlaAR8-8mEn6XnpwBrkppVYcg-6MChK7EnvhDHd8SW7-an_3pNC3X2n4tOm7RWiPD7SnD_AfeGVLdw</recordid><startdate>202310</startdate><enddate>202310</enddate><creator>Vignesh, P. Preetham</creator><creator>Jiang, Jonathan H.</creator><creator>Kishore, P.</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-5929-8951</orcidid></search><sort><creationdate>202310</creationdate><title>Predicting PM 2.5 Concentrations Across USA Using Machine Learning</title><author>Vignesh, P. Preetham ; Jiang, Jonathan H. ; Kishore, P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c801-74051a2db9aeb6767ea40d1d2d4ea45df5d0c9139857631e7b96e51b81f4c93c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vignesh, P. Preetham</creatorcontrib><creatorcontrib>Jiang, Jonathan H.</creatorcontrib><creatorcontrib>Kishore, P.</creatorcontrib><collection>CrossRef</collection><jtitle>Earth and space science (Hoboken, N.J.)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vignesh, P. Preetham</au><au>Jiang, Jonathan H.</au><au>Kishore, P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predicting PM 2.5 Concentrations Across USA Using Machine Learning</atitle><jtitle>Earth and space science (Hoboken, N.J.)</jtitle><date>2023-10</date><risdate>2023</risdate><volume>10</volume><issue>10</issue><issn>2333-5084</issn><eissn>2333-5084</eissn><abstract>Economic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM 2.5 ). Although previous studies have tried to observe PM 2.5 both spatially and temporally using aerosol remote sensing and geostatistical estimation, they were limited in accuracy by coarse resolution. In this paper, the performance of machine learning models on predicting PM 2.5 is assessed with linear regression (LR), decision tree (DT), gradient boosting regression (GBR), AdaBoost regression (ABR), XGBoost (XGB), k‐nearest neighbors (K‐NN), long short‐term memory (LSTM), random forest (RF), and support vector machine (SVM) using PM 2.5 station data from 2017 to 2021. To compare the accuracy of all the nine machine learning models, the coefficient of determination ( R 2 ), root mean square error (RMSE), Nash‐Sutcliffe efficiency (NSE), root mean square error ratio (RSR), and percent bias (PBIAS) were evaluated. Among all nine models, the RF (100 decision trees with a max depth of 20) and support vector regression (SVR; nonlinear kernel, degree 3 polynomial) models were the best for predicting PM 2.5 concentrations. Additionally, comparison of the PM 2.5 performance metrics displayed that the models had better predictive behavior in the western United States than that in the eastern United States. We present the prediction of PM 2.5 concentrations over the United States using various machine learning (ML) algorithms We show ML as a new approach for analyzing large data sets due to the computational speed and easy implementation for massive amounts of data The study is important for improving our understanding of the differences among ML algorithms for Earth Science research</abstract><doi>10.1029/2023EA002911</doi><orcidid>https://orcid.org/0000-0002-5929-8951</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 2333-5084
ispartof Earth and space science (Hoboken, N.J.), 2023-10, Vol.10 (10)
issn 2333-5084
2333-5084
language eng
recordid cdi_crossref_primary_10_1029_2023EA002911
source Wiley Online Library Open Access; DOAJ Directory of Open Access Journals; Wiley Online Library Journals Frontfile Complete; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
title Predicting PM 2.5 Concentrations Across USA Using Machine Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T04%3A46%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predicting%20PM%202.5%20Concentrations%20Across%20USA%20Using%20Machine%20Learning&rft.jtitle=Earth%20and%20space%20science%20(Hoboken,%20N.J.)&rft.au=Vignesh,%20P.%20Preetham&rft.date=2023-10&rft.volume=10&rft.issue=10&rft.issn=2333-5084&rft.eissn=2333-5084&rft_id=info:doi/10.1029/2023EA002911&rft_dat=%3Ccrossref%3E10_1029_2023EA002911%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true