Machine learning prediction of health risk and spatial dependence of geogenic contaminated groundwater from the Hetao Basin, China

Geogenic contaminated groundwater (GCG), characterized by elevated arsenic, fluoride, and iodine levels, present a significant challenge to public health and government management. Conventional survey-based approaches of collecting groundwater samples, conducting physicochemical tests, and performin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of geochemical exploration 2024-07, Vol.262, p.107497, Article 107497
Hauptverfasser: Xia, Peng, Zhao, Yifu, Xie, Xianjun, Li, Junxia, Qian, Kun, You, Haoyu, Zhang, Jingxian, Ge, Weili, Pan, Hongjie, Wang, Yanxin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Geogenic contaminated groundwater (GCG), characterized by elevated arsenic, fluoride, and iodine levels, present a significant challenge to public health and government management. Conventional survey-based approaches of collecting groundwater samples, conducting physicochemical tests, and performing spatial interpolation to obtain regional groundwater chemical component maps are inefficient and costly. More importantly, it does not take into account the actual hydrogeological conditions or the characteristics of pollutant transport and enrichment. To address this issue, we utilized Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), and Extreme Gradient Boosting (XGBoost) to analyze the likelihood of occurrence of arsenic, fluoride, and iodine as well as their spatial distribution in shallow groundwater from the Hetao Basin. Our study incorporated 20 indicators related to meteorology, soil physicochemical properties, and groundwater conditions, along with 1505 labeled samples consisting of groundwater arsenic, fluoride, and iodine concentrations and their corresponding coordinates. Subsequently, the study automatically analyzed the meteorological, soil physicochemical properties and groundwater conditions by constructing a machine learning model using the available data. In order to optimise and select the best prediction model, this paper presents a quantitative evaluation of the prediction performance of various machine learning models. The accuracy (AC), area under curve (AUC) and mean squared error (MSE) were calculated to predict the spatial distribution of CGC. Subsequently, the optimized model for predicting the spatial distribution of GCG was selected. The results showed that the XGBoost algorithm provided optimal predictions for groundwater with arsenic concentrations above 10 μg/L and fluoride concentrations exceeding 1.5 mg/L, whereas the RF model provided the best predictions for groundwater with arsenic concentrations surpassing 50 μg/L and iodine concentrations exceeding 100 μg/L. Subsequently, groundwater health risk zones were delineated based on an optimal prediction model, and demographic analysis was conducted in both the direct and potential groundwater risk zones. Model predictions indicated that hundreds of thousands of people in the Hetao Basin were facing a public health crisis caused by high concentrations of arsenic, fluoride and iodine in groundwater. These findings underscore the significant health c
ISSN:0375-6742
1879-1689
DOI:10.1016/j.gexplo.2024.107497