Comparative evaluation of machine learning models for groundwater quality assessment
Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient bo...
Gespeichert in:
Veröffentlicht in: | Environmental monitoring and assessment 2020-12, Vol.192 (12), p.776, Article 776 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 12 |
container_start_page | 776 |
container_title | Environmental monitoring and assessment |
container_volume | 192 |
creator | Bedi, Shine Samal, Ashok Ray, Chittaranjan Snow, Daniel |
description | Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: oversampling, weighting, and oversampling and weighting, for all the scenarios. The models’ performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using game-theoretic Shapley values to rank features consistently and offer model interpretability. |
doi_str_mv | 10.1007/s10661-020-08695-3 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2473254571</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2473254571</sourcerecordid><originalsourceid>FETCH-LOGICAL-c419t-de83dd68a1c35c1f80144d2d0a031744e5c74aebf7e31077621577c5e5745b7a3</originalsourceid><addsrcrecordid>eNp9kMlOwzAQhi0EoqXwAhyQJc4GO96SI6rYJCQu5Wy58aSkSuLWTor69hhS4MZpRpp_0XwIXTJ6wyjVt5FRpRihGSU0V4Uk_AhNmdScZIUsjtGUMqWJ4qqYoLMY15TSQoviFE04z1iRKzFFi7lvNzbYvt4Bhp1thrT6DvsKt7Z8rzvADdjQ1d0Kt95BE3HlA14FP3Tuw_YQ8HawTd3vsY0RYmyh68_RSWWbCBeHOUNvD_eL-RN5eX18nt-9kFKwoicOcu6cyi0ruSxZlVMmhMsctZQzLQTIUgsLy0oDZ1RrlaXndClBaiGX2vIZuh5zN8FvB4i9WfshdKnSZELzTAqpWVJlo6oMPsYAldmEurVhbxg1XyDNCNIkkOYbpOHJdHWIHpYtuF_LD7kk4KMgplO3gvDX_U_sJ95Ofss</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2473254571</pqid></control><display><type>article</type><title>Comparative evaluation of machine learning models for groundwater quality assessment</title><source>MEDLINE</source><source>SpringerLink Journals - AutoHoldings</source><creator>Bedi, Shine ; Samal, Ashok ; Ray, Chittaranjan ; Snow, Daniel</creator><creatorcontrib>Bedi, Shine ; Samal, Ashok ; Ray, Chittaranjan ; Snow, Daniel</creatorcontrib><description>Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: oversampling, weighting, and oversampling and weighting, for all the scenarios. The models’ performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using game-theoretic Shapley values to rank features consistently and offer model interpretability.</description><identifier>ISSN: 0167-6369</identifier><identifier>EISSN: 1573-2959</identifier><identifier>DOI: 10.1007/s10661-020-08695-3</identifier><identifier>PMID: 33219864</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Artificial neural networks ; Atmospheric Protection/Air Quality Control/Air Pollution ; Contamination ; Earth and Environmental Science ; Ecology ; Ecotoxicology ; Environment ; Environmental Management ; Environmental Monitoring ; Environmental science ; Game theory ; Groundwater ; Groundwater quality ; Hydrogeology ; Independent variables ; Land use ; Learning algorithms ; Learning theory ; Machine Learning ; Mitigation ; Monitoring/Environmental Analysis ; Neural networks ; Neural Networks, Computer ; Oversampling ; Performance evaluation ; Pesticides ; Quality assessment ; Quality control ; Regression analysis ; Regression models ; Support Vector Machine ; Support vector machines ; Water quality ; Weighting</subject><ispartof>Environmental monitoring and assessment, 2020-12, Vol.192 (12), p.776, Article 776</ispartof><rights>Springer Nature Switzerland AG 2020</rights><rights>Springer Nature Switzerland AG 2020.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c419t-de83dd68a1c35c1f80144d2d0a031744e5c74aebf7e31077621577c5e5745b7a3</citedby><cites>FETCH-LOGICAL-c419t-de83dd68a1c35c1f80144d2d0a031744e5c74aebf7e31077621577c5e5745b7a3</cites><orcidid>0000-0002-8558-9509</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10661-020-08695-3$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10661-020-08695-3$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,41486,42555,51317</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33219864$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Bedi, Shine</creatorcontrib><creatorcontrib>Samal, Ashok</creatorcontrib><creatorcontrib>Ray, Chittaranjan</creatorcontrib><creatorcontrib>Snow, Daniel</creatorcontrib><title>Comparative evaluation of machine learning models for groundwater quality assessment</title><title>Environmental monitoring and assessment</title><addtitle>Environ Monit Assess</addtitle><addtitle>Environ Monit Assess</addtitle><description>Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: oversampling, weighting, and oversampling and weighting, for all the scenarios. The models’ performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using game-theoretic Shapley values to rank features consistently and offer model interpretability.</description><subject>Artificial neural networks</subject><subject>Atmospheric Protection/Air Quality Control/Air Pollution</subject><subject>Contamination</subject><subject>Earth and Environmental Science</subject><subject>Ecology</subject><subject>Ecotoxicology</subject><subject>Environment</subject><subject>Environmental Management</subject><subject>Environmental Monitoring</subject><subject>Environmental science</subject><subject>Game theory</subject><subject>Groundwater</subject><subject>Groundwater quality</subject><subject>Hydrogeology</subject><subject>Independent variables</subject><subject>Land use</subject><subject>Learning algorithms</subject><subject>Learning theory</subject><subject>Machine Learning</subject><subject>Mitigation</subject><subject>Monitoring/Environmental Analysis</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Oversampling</subject><subject>Performance evaluation</subject><subject>Pesticides</subject><subject>Quality assessment</subject><subject>Quality control</subject><subject>Regression analysis</subject><subject>Regression models</subject><subject>Support Vector Machine</subject><subject>Support vector machines</subject><subject>Water quality</subject><subject>Weighting</subject><issn>0167-6369</issn><issn>1573-2959</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kMlOwzAQhi0EoqXwAhyQJc4GO96SI6rYJCQu5Wy58aSkSuLWTor69hhS4MZpRpp_0XwIXTJ6wyjVt5FRpRihGSU0V4Uk_AhNmdScZIUsjtGUMqWJ4qqYoLMY15TSQoviFE04z1iRKzFFi7lvNzbYvt4Bhp1thrT6DvsKt7Z8rzvADdjQ1d0Kt95BE3HlA14FP3Tuw_YQ8HawTd3vsY0RYmyh68_RSWWbCBeHOUNvD_eL-RN5eX18nt-9kFKwoicOcu6cyi0ruSxZlVMmhMsctZQzLQTIUgsLy0oDZ1RrlaXndClBaiGX2vIZuh5zN8FvB4i9WfshdKnSZELzTAqpWVJlo6oMPsYAldmEurVhbxg1XyDNCNIkkOYbpOHJdHWIHpYtuF_LD7kk4KMgplO3gvDX_U_sJ95Ofss</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Bedi, Shine</creator><creator>Samal, Ashok</creator><creator>Ray, Chittaranjan</creator><creator>Snow, Daniel</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QH</scope><scope>7QL</scope><scope>7SN</scope><scope>7ST</scope><scope>7T7</scope><scope>7TG</scope><scope>7TN</scope><scope>7U7</scope><scope>7UA</scope><scope>7WY</scope><scope>7WZ</scope><scope>7X7</scope><scope>7XB</scope><scope>87Z</scope><scope>88E</scope><scope>88I</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>F1W</scope><scope>FR3</scope><scope>FRNLG</scope><scope>FYUFA</scope><scope>F~G</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H97</scope><scope>HCIFZ</scope><scope>K60</scope><scope>K6~</scope><scope>K9.</scope><scope>KL.</scope><scope>L.-</scope><scope>L.G</scope><scope>M0C</scope><scope>M0S</scope><scope>M1P</scope><scope>M2P</scope><scope>M7N</scope><scope>P64</scope><scope>PATMY</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PYCSY</scope><scope>Q9U</scope><scope>SOI</scope><orcidid>https://orcid.org/0000-0002-8558-9509</orcidid></search><sort><creationdate>20201201</creationdate><title>Comparative evaluation of machine learning models for groundwater quality assessment</title><author>Bedi, Shine ; Samal, Ashok ; Ray, Chittaranjan ; Snow, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c419t-de83dd68a1c35c1f80144d2d0a031744e5c74aebf7e31077621577c5e5745b7a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Artificial neural networks</topic><topic>Atmospheric Protection/Air Quality Control/Air Pollution</topic><topic>Contamination</topic><topic>Earth and Environmental Science</topic><topic>Ecology</topic><topic>Ecotoxicology</topic><topic>Environment</topic><topic>Environmental Management</topic><topic>Environmental Monitoring</topic><topic>Environmental science</topic><topic>Game theory</topic><topic>Groundwater</topic><topic>Groundwater quality</topic><topic>Hydrogeology</topic><topic>Independent variables</topic><topic>Land use</topic><topic>Learning algorithms</topic><topic>Learning theory</topic><topic>Machine Learning</topic><topic>Mitigation</topic><topic>Monitoring/Environmental Analysis</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Oversampling</topic><topic>Performance evaluation</topic><topic>Pesticides</topic><topic>Quality assessment</topic><topic>Quality control</topic><topic>Regression analysis</topic><topic>Regression models</topic><topic>Support Vector Machine</topic><topic>Support vector machines</topic><topic>Water quality</topic><topic>Weighting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bedi, Shine</creatorcontrib><creatorcontrib>Samal, Ashok</creatorcontrib><creatorcontrib>Ray, Chittaranjan</creatorcontrib><creatorcontrib>Snow, Daniel</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Aqualine</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Ecology Abstracts</collection><collection>Environment Abstracts</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Oceanic Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Water Resources Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Business Premium Collection (Alumni)</collection><collection>Health Research Premium Collection</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) 3: Aquatic Pollution & Environmental Quality</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) Professional</collection><collection>ABI/INFORM Global</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Science Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Environmental Science Collection</collection><collection>ProQuest Central Basic</collection><collection>Environment Abstracts</collection><jtitle>Environmental monitoring and assessment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bedi, Shine</au><au>Samal, Ashok</au><au>Ray, Chittaranjan</au><au>Snow, Daniel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparative evaluation of machine learning models for groundwater quality assessment</atitle><jtitle>Environmental monitoring and assessment</jtitle><stitle>Environ Monit Assess</stitle><addtitle>Environ Monit Assess</addtitle><date>2020-12-01</date><risdate>2020</risdate><volume>192</volume><issue>12</issue><spage>776</spage><pages>776-</pages><artnum>776</artnum><issn>0167-6369</issn><eissn>1573-2959</eissn><abstract>Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: oversampling, weighting, and oversampling and weighting, for all the scenarios. The models’ performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using game-theoretic Shapley values to rank features consistently and offer model interpretability.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><pmid>33219864</pmid><doi>10.1007/s10661-020-08695-3</doi><orcidid>https://orcid.org/0000-0002-8558-9509</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0167-6369 |
ispartof | Environmental monitoring and assessment, 2020-12, Vol.192 (12), p.776, Article 776 |
issn | 0167-6369 1573-2959 |
language | eng |
recordid | cdi_proquest_journals_2473254571 |
source | MEDLINE; SpringerLink Journals - AutoHoldings |
subjects | Artificial neural networks Atmospheric Protection/Air Quality Control/Air Pollution Contamination Earth and Environmental Science Ecology Ecotoxicology Environment Environmental Management Environmental Monitoring Environmental science Game theory Groundwater Groundwater quality Hydrogeology Independent variables Land use Learning algorithms Learning theory Machine Learning Mitigation Monitoring/Environmental Analysis Neural networks Neural Networks, Computer Oversampling Performance evaluation Pesticides Quality assessment Quality control Regression analysis Regression models Support Vector Machine Support vector machines Water quality Weighting |
title | Comparative evaluation of machine learning models for groundwater quality assessment |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T21%3A08%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparative%20evaluation%20of%20machine%20learning%20models%20for%20groundwater%20quality%20assessment&rft.jtitle=Environmental%20monitoring%20and%20assessment&rft.au=Bedi,%20Shine&rft.date=2020-12-01&rft.volume=192&rft.issue=12&rft.spage=776&rft.pages=776-&rft.artnum=776&rft.issn=0167-6369&rft.eissn=1573-2959&rft_id=info:doi/10.1007/s10661-020-08695-3&rft_dat=%3Cproquest_cross%3E2473254571%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2473254571&rft_id=info:pmid/33219864&rfr_iscdi=true |