Predicting PM 2.5 Concentrations Across USA Using Machine Learning
Economic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM 2.5 ). Although previous studies have tried to observe PM 2.5 both spatially and temporally using aerosol remot...
Gespeichert in:
Veröffentlicht in: | Earth and space science (Hoboken, N.J.) N.J.), 2023-10, Vol.10 (10) |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 10 |
container_start_page | |
container_title | Earth and space science (Hoboken, N.J.) |
container_volume | 10 |
creator | Vignesh, P. Preetham Jiang, Jonathan H. Kishore, P. |
description | Economic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM
2.5
). Although previous studies have tried to observe PM
2.5
both spatially and temporally using aerosol remote sensing and geostatistical estimation, they were limited in accuracy by coarse resolution. In this paper, the performance of machine learning models on predicting PM
2.5
is assessed with linear regression (LR), decision tree (DT), gradient boosting regression (GBR), AdaBoost regression (ABR), XGBoost (XGB), k‐nearest neighbors (K‐NN), long short‐term memory (LSTM), random forest (RF), and support vector machine (SVM) using PM
2.5
station data from 2017 to 2021. To compare the accuracy of all the nine machine learning models, the coefficient of determination (
R
2
), root mean square error (RMSE), Nash‐Sutcliffe efficiency (NSE), root mean square error ratio (RSR), and percent bias (PBIAS) were evaluated. Among all nine models, the RF (100 decision trees with a max depth of 20) and support vector regression (SVR; nonlinear kernel, degree 3 polynomial) models were the best for predicting PM
2.5
concentrations. Additionally, comparison of the PM
2.5
performance metrics displayed that the models had better predictive behavior in the western United States than that in the eastern United States.
We present the prediction of PM
2.5
concentrations over the United States using various machine learning (ML) algorithms
We show ML as a new approach for analyzing large data sets due to the computational speed and easy implementation for massive amounts of data
The study is important for improving our understanding of the differences among ML algorithms for Earth Science research |
doi_str_mv | 10.1029/2023EA002911 |
format | Article |
fullrecord | <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1029_2023EA002911</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1029_2023EA002911</sourcerecordid><originalsourceid>FETCH-LOGICAL-c801-74051a2db9aeb6767ea40d1d2d4ea45df5d0c9139857631e7b96e51b81f4c93c3</originalsourceid><addsrcrecordid>eNpNkMtOwzAURC0EElXpjg_wB5Byr5_xMkTlIaWiEu06cmwHgsBBdjb8PSmw6GqORqPRaAi5RlgjMHPLgPFNBTMinpEF45wXEkpxfsKXZJXzOwAgkwqYWJC7XQp-cNMQX-luS9la0nqMLsQp2WkYY6aVS2PO9PBS0UM-xrbWvQ0x0CbYFGfjilz09iOH1b8uyf5-s68fi-b54amumsKVgIUWINEy3xkbOqWVDlaAR8-8mEn6XnpwBrkppVYcg-6MChK7EnvhDHd8SW7-an_3pNC3X2n4tOm7RWiPD7SnD_AfeGVLdw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Predicting PM 2.5 Concentrations Across USA Using Machine Learning</title><source>Wiley Online Library Open Access</source><source>DOAJ Directory of Open Access Journals</source><source>Wiley Online Library Journals Frontfile Complete</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Vignesh, P. Preetham ; Jiang, Jonathan H. ; Kishore, P.</creator><creatorcontrib>Vignesh, P. Preetham ; Jiang, Jonathan H. ; Kishore, P.</creatorcontrib><description>Economic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM
2.5
). Although previous studies have tried to observe PM
2.5
both spatially and temporally using aerosol remote sensing and geostatistical estimation, they were limited in accuracy by coarse resolution. In this paper, the performance of machine learning models on predicting PM
2.5
is assessed with linear regression (LR), decision tree (DT), gradient boosting regression (GBR), AdaBoost regression (ABR), XGBoost (XGB), k‐nearest neighbors (K‐NN), long short‐term memory (LSTM), random forest (RF), and support vector machine (SVM) using PM
2.5
station data from 2017 to 2021. To compare the accuracy of all the nine machine learning models, the coefficient of determination (
R
2
), root mean square error (RMSE), Nash‐Sutcliffe efficiency (NSE), root mean square error ratio (RSR), and percent bias (PBIAS) were evaluated. Among all nine models, the RF (100 decision trees with a max depth of 20) and support vector regression (SVR; nonlinear kernel, degree 3 polynomial) models were the best for predicting PM
2.5
concentrations. Additionally, comparison of the PM
2.5
performance metrics displayed that the models had better predictive behavior in the western United States than that in the eastern United States.
We present the prediction of PM
2.5
concentrations over the United States using various machine learning (ML) algorithms
We show ML as a new approach for analyzing large data sets due to the computational speed and easy implementation for massive amounts of data
The study is important for improving our understanding of the differences among ML algorithms for Earth Science research</description><identifier>ISSN: 2333-5084</identifier><identifier>EISSN: 2333-5084</identifier><identifier>DOI: 10.1029/2023EA002911</identifier><language>eng</language><ispartof>Earth and space science (Hoboken, N.J.), 2023-10, Vol.10 (10)</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c801-74051a2db9aeb6767ea40d1d2d4ea45df5d0c9139857631e7b96e51b81f4c93c3</citedby><cites>FETCH-LOGICAL-c801-74051a2db9aeb6767ea40d1d2d4ea45df5d0c9139857631e7b96e51b81f4c93c3</cites><orcidid>0000-0002-5929-8951</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,860,27903,27904</link.rule.ids></links><search><creatorcontrib>Vignesh, P. Preetham</creatorcontrib><creatorcontrib>Jiang, Jonathan H.</creatorcontrib><creatorcontrib>Kishore, P.</creatorcontrib><title>Predicting PM 2.5 Concentrations Across USA Using Machine Learning</title><title>Earth and space science (Hoboken, N.J.)</title><description>Economic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM
2.5
). Although previous studies have tried to observe PM
2.5
both spatially and temporally using aerosol remote sensing and geostatistical estimation, they were limited in accuracy by coarse resolution. In this paper, the performance of machine learning models on predicting PM
2.5
is assessed with linear regression (LR), decision tree (DT), gradient boosting regression (GBR), AdaBoost regression (ABR), XGBoost (XGB), k‐nearest neighbors (K‐NN), long short‐term memory (LSTM), random forest (RF), and support vector machine (SVM) using PM
2.5
station data from 2017 to 2021. To compare the accuracy of all the nine machine learning models, the coefficient of determination (
R
2
), root mean square error (RMSE), Nash‐Sutcliffe efficiency (NSE), root mean square error ratio (RSR), and percent bias (PBIAS) were evaluated. Among all nine models, the RF (100 decision trees with a max depth of 20) and support vector regression (SVR; nonlinear kernel, degree 3 polynomial) models were the best for predicting PM
2.5
concentrations. Additionally, comparison of the PM
2.5
performance metrics displayed that the models had better predictive behavior in the western United States than that in the eastern United States.
We present the prediction of PM
2.5
concentrations over the United States using various machine learning (ML) algorithms
We show ML as a new approach for analyzing large data sets due to the computational speed and easy implementation for massive amounts of data
The study is important for improving our understanding of the differences among ML algorithms for Earth Science research</description><issn>2333-5084</issn><issn>2333-5084</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNkMtOwzAURC0EElXpjg_wB5Byr5_xMkTlIaWiEu06cmwHgsBBdjb8PSmw6GqORqPRaAi5RlgjMHPLgPFNBTMinpEF45wXEkpxfsKXZJXzOwAgkwqYWJC7XQp-cNMQX-luS9la0nqMLsQp2WkYY6aVS2PO9PBS0UM-xrbWvQ0x0CbYFGfjilz09iOH1b8uyf5-s68fi-b54amumsKVgIUWINEy3xkbOqWVDlaAR8-8mEn6XnpwBrkppVYcg-6MChK7EnvhDHd8SW7-an_3pNC3X2n4tOm7RWiPD7SnD_AfeGVLdw</recordid><startdate>202310</startdate><enddate>202310</enddate><creator>Vignesh, P. Preetham</creator><creator>Jiang, Jonathan H.</creator><creator>Kishore, P.</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-5929-8951</orcidid></search><sort><creationdate>202310</creationdate><title>Predicting PM 2.5 Concentrations Across USA Using Machine Learning</title><author>Vignesh, P. Preetham ; Jiang, Jonathan H. ; Kishore, P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c801-74051a2db9aeb6767ea40d1d2d4ea45df5d0c9139857631e7b96e51b81f4c93c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vignesh, P. Preetham</creatorcontrib><creatorcontrib>Jiang, Jonathan H.</creatorcontrib><creatorcontrib>Kishore, P.</creatorcontrib><collection>CrossRef</collection><jtitle>Earth and space science (Hoboken, N.J.)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vignesh, P. Preetham</au><au>Jiang, Jonathan H.</au><au>Kishore, P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predicting PM 2.5 Concentrations Across USA Using Machine Learning</atitle><jtitle>Earth and space science (Hoboken, N.J.)</jtitle><date>2023-10</date><risdate>2023</risdate><volume>10</volume><issue>10</issue><issn>2333-5084</issn><eissn>2333-5084</eissn><abstract>Economic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM
2.5
). Although previous studies have tried to observe PM
2.5
both spatially and temporally using aerosol remote sensing and geostatistical estimation, they were limited in accuracy by coarse resolution. In this paper, the performance of machine learning models on predicting PM
2.5
is assessed with linear regression (LR), decision tree (DT), gradient boosting regression (GBR), AdaBoost regression (ABR), XGBoost (XGB), k‐nearest neighbors (K‐NN), long short‐term memory (LSTM), random forest (RF), and support vector machine (SVM) using PM
2.5
station data from 2017 to 2021. To compare the accuracy of all the nine machine learning models, the coefficient of determination (
R
2
), root mean square error (RMSE), Nash‐Sutcliffe efficiency (NSE), root mean square error ratio (RSR), and percent bias (PBIAS) were evaluated. Among all nine models, the RF (100 decision trees with a max depth of 20) and support vector regression (SVR; nonlinear kernel, degree 3 polynomial) models were the best for predicting PM
2.5
concentrations. Additionally, comparison of the PM
2.5
performance metrics displayed that the models had better predictive behavior in the western United States than that in the eastern United States.
We present the prediction of PM
2.5
concentrations over the United States using various machine learning (ML) algorithms
We show ML as a new approach for analyzing large data sets due to the computational speed and easy implementation for massive amounts of data
The study is important for improving our understanding of the differences among ML algorithms for Earth Science research</abstract><doi>10.1029/2023EA002911</doi><orcidid>https://orcid.org/0000-0002-5929-8951</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2333-5084 |
ispartof | Earth and space science (Hoboken, N.J.), 2023-10, Vol.10 (10) |
issn | 2333-5084 2333-5084 |
language | eng |
recordid | cdi_crossref_primary_10_1029_2023EA002911 |
source | Wiley Online Library Open Access; DOAJ Directory of Open Access Journals; Wiley Online Library Journals Frontfile Complete; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
title | Predicting PM 2.5 Concentrations Across USA Using Machine Learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T04%3A46%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predicting%20PM%202.5%20Concentrations%20Across%20USA%20Using%20Machine%20Learning&rft.jtitle=Earth%20and%20space%20science%20(Hoboken,%20N.J.)&rft.au=Vignesh,%20P.%20Preetham&rft.date=2023-10&rft.volume=10&rft.issue=10&rft.issn=2333-5084&rft.eissn=2333-5084&rft_id=info:doi/10.1029/2023EA002911&rft_dat=%3Ccrossref%3E10_1029_2023EA002911%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |