Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling

The main aim of the present study is to explore and compare three state-of-the art data mining techniques, best-first decision tree, random forest, and naïve Bayes tree, for landslide susceptibility assessment in the Longhai area of China. First, a landslide inventory map with 93 landslide locations...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Science of the total environment 2018-12, Vol.644, p.1006-1018
Hauptverfasser: Chen, Wei, Zhang, Shuai, Li, Renwei, Shahabi, Himan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The main aim of the present study is to explore and compare three state-of-the art data mining techniques, best-first decision tree, random forest, and naïve Bayes tree, for landslide susceptibility assessment in the Longhai area of China. First, a landslide inventory map with 93 landslide locations was randomly divided, with 70% of the area used for training landslide models and 30% used for the validation process. A spatial database of 14 conditioning factors was constructed under a geographic information system environment. Subsequently, the ReliefF method was employed to assess the prediction capability of the conditioning factors in landslide models. Multicollinearity of these factors was verified using the variance inflation factor, tolerance, and Pearson's correlation coefficient. Finally, the three resulting models were evaluated and compared using the area under the receiver operating characteristic (AUROC) curve, standard error, 95% confidence interval, accuracy, precision, recall, and F-measure. The random forest model showed the AUROC values (0.869), smallest standard error (0.025), narrowest 95% confidence interval (0.819–0.918), highest accuracy value (0.774), highest precision (0.662), and highest F-measure (0.662) for the training dataset. Thus, the random forest model is a promising technique that could be used for landslide susceptibility mapping. [Display omitted] •The effectiveness of three advanced models is compared.•Predictive capability and multicollinearity of landslide factors are analyzed.•The performance of the maps has been validated against historical landslide data.•The RF model outperforms the other models.
ISSN:0048-9697
1879-1026
DOI:10.1016/j.scitotenv.2018.06.389