The effect of sample size on different machine learning models for groundwater potential mapping in mountain bedrock aquifers

•RF and ANFIS-ICA models were less sensitive to reductions of sample size.•The RF outperformed other models in terms of goodness-of-fit and predictive performance.•The RSP and lithology were the main geo-environmental spring-affecting factors.•Approximately 18.68% of the study area has high or very...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Catena (Giessen) 2020-04, Vol.187, p.104421, Article 104421
Hauptverfasser: Moghaddam, Davoud Davoudi, Rahmati, Omid, Panahi, Mahdi, Tiefenbacher, John, Darabi, Hamid, Haghizadeh, Ali, Haghighi, Ali Torabi, Nalivan, Omid Asadi, Tien Bui, Dieu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•RF and ANFIS-ICA models were less sensitive to reductions of sample size.•The RF outperformed other models in terms of goodness-of-fit and predictive performance.•The RSP and lithology were the main geo-environmental spring-affecting factors.•Approximately 18.68% of the study area has high or very high groundwater potential. Machine learning models have attracted much research attention for groundwater potential mapping. However, the accuracy of models for groundwater potential mapping is significantly influenced by sample size and this is still a challenge. This study evaluates the influence of sample size on the accuracy of different individual and hybrid models, adaptive neuro-fuzzy inference system (ANFIS), ANFIS-imperial competitive algorithm (ANFIS-ICA), alternating decision tree (ADT), and random forest (RF) to model groundwater potential, considering the number of springs from 177 to 714. A well-documented inventory of springs, as a natural representative of groundwater potential, was used to designate four sample data sets: 100% (D1), 75% (D2), 50% (D3), and 25% (D4) of the entire springs inventory. Each data set was randomly split into two groups of 30% (for training) and 70% (for validation). Fifteen diverse geo-environmental factors were employed as independent variables. The area under the operating receiver characteristic curve (AUROC) and the true skill statistic (TSS) as two cutoff-independent and cutoff-dependent performance metrics were used to assess the performance of models. Results showed that the sample size influenced the performance of four machine learning algorithms, but RF had a lower sensitivity to the reduction of sample size. In addition, validation results revealed that RF (AUROC = 90.74–96.32%, TSS = 0.79–0.85) had the best performance based on all four sample data sets, followed by ANFIS-ICA (AUROC = 81.23–91.55%, TSS = 0.74–0.81), ADT (AUROC = 79.29–88.46%, TSS = 0.59–0.74), and ANFIS (AUROC = 73.11–88.43%, TSS = 0.59–0.74). Further, the relative slope position, lithology, and distance from faults were the main spring-affecting factors contributing to groundwater potential modelling. This study can provide useful guidelines and a valuable reference for selecting machine learning models when a complete spring inventory in a watershed is unavailable.
ISSN:0341-8162
1872-6887
DOI:10.1016/j.catena.2019.104421