Comparative evaluation of machine learning models for groundwater quality assessment

Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient bo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Environmental monitoring and assessment 2020-12, Vol.192 (12), p.776, Article 776
Hauptverfasser: Bedi, Shine, Samal, Ashok, Ray, Chittaranjan, Snow, Daniel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 12
container_start_page 776
container_title Environmental monitoring and assessment
container_volume 192
creator Bedi, Shine
Samal, Ashok
Ray, Chittaranjan
Snow, Daniel
description Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: oversampling, weighting, and oversampling and weighting, for all the scenarios. The models’ performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using game-theoretic Shapley values to rank features consistently and offer model interpretability.
doi_str_mv 10.1007/s10661-020-08695-3
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2473254571</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2473254571</sourcerecordid><originalsourceid>FETCH-LOGICAL-c419t-de83dd68a1c35c1f80144d2d0a031744e5c74aebf7e31077621577c5e5745b7a3</originalsourceid><addsrcrecordid>eNp9kMlOwzAQhi0EoqXwAhyQJc4GO96SI6rYJCQu5Wy58aSkSuLWTor69hhS4MZpRpp_0XwIXTJ6wyjVt5FRpRihGSU0V4Uk_AhNmdScZIUsjtGUMqWJ4qqYoLMY15TSQoviFE04z1iRKzFFi7lvNzbYvt4Bhp1thrT6DvsKt7Z8rzvADdjQ1d0Kt95BE3HlA14FP3Tuw_YQ8HawTd3vsY0RYmyh68_RSWWbCBeHOUNvD_eL-RN5eX18nt-9kFKwoicOcu6cyi0ruSxZlVMmhMsctZQzLQTIUgsLy0oDZ1RrlaXndClBaiGX2vIZuh5zN8FvB4i9WfshdKnSZELzTAqpWVJlo6oMPsYAldmEurVhbxg1XyDNCNIkkOYbpOHJdHWIHpYtuF_LD7kk4KMgplO3gvDX_U_sJ95Ofss</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2473254571</pqid></control><display><type>article</type><title>Comparative evaluation of machine learning models for groundwater quality assessment</title><source>MEDLINE</source><source>SpringerLink Journals - AutoHoldings</source><creator>Bedi, Shine ; Samal, Ashok ; Ray, Chittaranjan ; Snow, Daniel</creator><creatorcontrib>Bedi, Shine ; Samal, Ashok ; Ray, Chittaranjan ; Snow, Daniel</creatorcontrib><description>Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: oversampling, weighting, and oversampling and weighting, for all the scenarios. The models’ performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using game-theoretic Shapley values to rank features consistently and offer model interpretability.</description><identifier>ISSN: 0167-6369</identifier><identifier>EISSN: 1573-2959</identifier><identifier>DOI: 10.1007/s10661-020-08695-3</identifier><identifier>PMID: 33219864</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Artificial neural networks ; Atmospheric Protection/Air Quality Control/Air Pollution ; Contamination ; Earth and Environmental Science ; Ecology ; Ecotoxicology ; Environment ; Environmental Management ; Environmental Monitoring ; Environmental science ; Game theory ; Groundwater ; Groundwater quality ; Hydrogeology ; Independent variables ; Land use ; Learning algorithms ; Learning theory ; Machine Learning ; Mitigation ; Monitoring/Environmental Analysis ; Neural networks ; Neural Networks, Computer ; Oversampling ; Performance evaluation ; Pesticides ; Quality assessment ; Quality control ; Regression analysis ; Regression models ; Support Vector Machine ; Support vector machines ; Water quality ; Weighting</subject><ispartof>Environmental monitoring and assessment, 2020-12, Vol.192 (12), p.776, Article 776</ispartof><rights>Springer Nature Switzerland AG 2020</rights><rights>Springer Nature Switzerland AG 2020.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c419t-de83dd68a1c35c1f80144d2d0a031744e5c74aebf7e31077621577c5e5745b7a3</citedby><cites>FETCH-LOGICAL-c419t-de83dd68a1c35c1f80144d2d0a031744e5c74aebf7e31077621577c5e5745b7a3</cites><orcidid>0000-0002-8558-9509</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10661-020-08695-3$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10661-020-08695-3$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,41486,42555,51317</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33219864$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Bedi, Shine</creatorcontrib><creatorcontrib>Samal, Ashok</creatorcontrib><creatorcontrib>Ray, Chittaranjan</creatorcontrib><creatorcontrib>Snow, Daniel</creatorcontrib><title>Comparative evaluation of machine learning models for groundwater quality assessment</title><title>Environmental monitoring and assessment</title><addtitle>Environ Monit Assess</addtitle><addtitle>Environ Monit Assess</addtitle><description>Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: oversampling, weighting, and oversampling and weighting, for all the scenarios. The models’ performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using game-theoretic Shapley values to rank features consistently and offer model interpretability.</description><subject>Artificial neural networks</subject><subject>Atmospheric Protection/Air Quality Control/Air Pollution</subject><subject>Contamination</subject><subject>Earth and Environmental Science</subject><subject>Ecology</subject><subject>Ecotoxicology</subject><subject>Environment</subject><subject>Environmental Management</subject><subject>Environmental Monitoring</subject><subject>Environmental science</subject><subject>Game theory</subject><subject>Groundwater</subject><subject>Groundwater quality</subject><subject>Hydrogeology</subject><subject>Independent variables</subject><subject>Land use</subject><subject>Learning algorithms</subject><subject>Learning theory</subject><subject>Machine Learning</subject><subject>Mitigation</subject><subject>Monitoring/Environmental Analysis</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Oversampling</subject><subject>Performance evaluation</subject><subject>Pesticides</subject><subject>Quality assessment</subject><subject>Quality control</subject><subject>Regression analysis</subject><subject>Regression models</subject><subject>Support Vector Machine</subject><subject>Support vector machines</subject><subject>Water quality</subject><subject>Weighting</subject><issn>0167-6369</issn><issn>1573-2959</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kMlOwzAQhi0EoqXwAhyQJc4GO96SI6rYJCQu5Wy58aSkSuLWTor69hhS4MZpRpp_0XwIXTJ6wyjVt5FRpRihGSU0V4Uk_AhNmdScZIUsjtGUMqWJ4qqYoLMY15TSQoviFE04z1iRKzFFi7lvNzbYvt4Bhp1thrT6DvsKt7Z8rzvADdjQ1d0Kt95BE3HlA14FP3Tuw_YQ8HawTd3vsY0RYmyh68_RSWWbCBeHOUNvD_eL-RN5eX18nt-9kFKwoicOcu6cyi0ruSxZlVMmhMsctZQzLQTIUgsLy0oDZ1RrlaXndClBaiGX2vIZuh5zN8FvB4i9WfshdKnSZELzTAqpWVJlo6oMPsYAldmEurVhbxg1XyDNCNIkkOYbpOHJdHWIHpYtuF_LD7kk4KMgplO3gvDX_U_sJ95Ofss</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Bedi, Shine</creator><creator>Samal, Ashok</creator><creator>Ray, Chittaranjan</creator><creator>Snow, Daniel</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QH</scope><scope>7QL</scope><scope>7SN</scope><scope>7ST</scope><scope>7T7</scope><scope>7TG</scope><scope>7TN</scope><scope>7U7</scope><scope>7UA</scope><scope>7WY</scope><scope>7WZ</scope><scope>7X7</scope><scope>7XB</scope><scope>87Z</scope><scope>88E</scope><scope>88I</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>F1W</scope><scope>FR3</scope><scope>FRNLG</scope><scope>FYUFA</scope><scope>F~G</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H97</scope><scope>HCIFZ</scope><scope>K60</scope><scope>K6~</scope><scope>K9.</scope><scope>KL.</scope><scope>L.-</scope><scope>L.G</scope><scope>M0C</scope><scope>M0S</scope><scope>M1P</scope><scope>M2P</scope><scope>M7N</scope><scope>P64</scope><scope>PATMY</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PYCSY</scope><scope>Q9U</scope><scope>SOI</scope><orcidid>https://orcid.org/0000-0002-8558-9509</orcidid></search><sort><creationdate>20201201</creationdate><title>Comparative evaluation of machine learning models for groundwater quality assessment</title><author>Bedi, Shine ; Samal, Ashok ; Ray, Chittaranjan ; Snow, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c419t-de83dd68a1c35c1f80144d2d0a031744e5c74aebf7e31077621577c5e5745b7a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Artificial neural networks</topic><topic>Atmospheric Protection/Air Quality Control/Air Pollution</topic><topic>Contamination</topic><topic>Earth and Environmental Science</topic><topic>Ecology</topic><topic>Ecotoxicology</topic><topic>Environment</topic><topic>Environmental Management</topic><topic>Environmental Monitoring</topic><topic>Environmental science</topic><topic>Game theory</topic><topic>Groundwater</topic><topic>Groundwater quality</topic><topic>Hydrogeology</topic><topic>Independent variables</topic><topic>Land use</topic><topic>Learning algorithms</topic><topic>Learning theory</topic><topic>Machine Learning</topic><topic>Mitigation</topic><topic>Monitoring/Environmental Analysis</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Oversampling</topic><topic>Performance evaluation</topic><topic>Pesticides</topic><topic>Quality assessment</topic><topic>Quality control</topic><topic>Regression analysis</topic><topic>Regression models</topic><topic>Support Vector Machine</topic><topic>Support vector machines</topic><topic>Water quality</topic><topic>Weighting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bedi, Shine</creatorcontrib><creatorcontrib>Samal, Ashok</creatorcontrib><creatorcontrib>Ray, Chittaranjan</creatorcontrib><creatorcontrib>Snow, Daniel</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Aqualine</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Ecology Abstracts</collection><collection>Environment Abstracts</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Oceanic Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Water Resources Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Business Premium Collection (Alumni)</collection><collection>Health Research Premium Collection</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) 3: Aquatic Pollution &amp; Environmental Quality</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) Professional</collection><collection>ABI/INFORM Global</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Science Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Environmental Science Collection</collection><collection>ProQuest Central Basic</collection><collection>Environment Abstracts</collection><jtitle>Environmental monitoring and assessment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bedi, Shine</au><au>Samal, Ashok</au><au>Ray, Chittaranjan</au><au>Snow, Daniel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparative evaluation of machine learning models for groundwater quality assessment</atitle><jtitle>Environmental monitoring and assessment</jtitle><stitle>Environ Monit Assess</stitle><addtitle>Environ Monit Assess</addtitle><date>2020-12-01</date><risdate>2020</risdate><volume>192</volume><issue>12</issue><spage>776</spage><pages>776-</pages><artnum>776</artnum><issn>0167-6369</issn><eissn>1573-2959</eissn><abstract>Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: oversampling, weighting, and oversampling and weighting, for all the scenarios. The models’ performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using game-theoretic Shapley values to rank features consistently and offer model interpretability.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><pmid>33219864</pmid><doi>10.1007/s10661-020-08695-3</doi><orcidid>https://orcid.org/0000-0002-8558-9509</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0167-6369
ispartof Environmental monitoring and assessment, 2020-12, Vol.192 (12), p.776, Article 776
issn 0167-6369
1573-2959
language eng
recordid cdi_proquest_journals_2473254571
source MEDLINE; SpringerLink Journals - AutoHoldings
subjects Artificial neural networks
Atmospheric Protection/Air Quality Control/Air Pollution
Contamination
Earth and Environmental Science
Ecology
Ecotoxicology
Environment
Environmental Management
Environmental Monitoring
Environmental science
Game theory
Groundwater
Groundwater quality
Hydrogeology
Independent variables
Land use
Learning algorithms
Learning theory
Machine Learning
Mitigation
Monitoring/Environmental Analysis
Neural networks
Neural Networks, Computer
Oversampling
Performance evaluation
Pesticides
Quality assessment
Quality control
Regression analysis
Regression models
Support Vector Machine
Support vector machines
Water quality
Weighting
title Comparative evaluation of machine learning models for groundwater quality assessment
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T21%3A08%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparative%20evaluation%20of%20machine%20learning%20models%20for%20groundwater%20quality%20assessment&rft.jtitle=Environmental%20monitoring%20and%20assessment&rft.au=Bedi,%20Shine&rft.date=2020-12-01&rft.volume=192&rft.issue=12&rft.spage=776&rft.pages=776-&rft.artnum=776&rft.issn=0167-6369&rft.eissn=1573-2959&rft_id=info:doi/10.1007/s10661-020-08695-3&rft_dat=%3Cproquest_cross%3E2473254571%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2473254571&rft_id=info:pmid/33219864&rfr_iscdi=true