Ensemble Conditioning Factor Selection with Markov Chain Framework for Shallow Landslide Susceptibility Mapping in Lake Sapanca Basin and its Vicinity, Turkey

The selection of landslide predisposing factors is usually permeated by a certain level of subjectivity, which sometimes adversely affects the performance of the established predictive models. Although filter-based feature selection algorithms have been extensively utilized for discarding the irrele...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Baltic Journal of Modern Computing 2022, Vol.10 (2), p.224-240
Hauptverfasser: Kavzoglu, Taskin, Teke, Alihan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The selection of landslide predisposing factors is usually permeated by a certain level of subjectivity, which sometimes adversely affects the performance of the established predictive models. Although filter-based feature selection algorithms have been extensively utilized for discarding the irrelevant factors from the geospatial database, they extremely suffer from statistical biases. Another important limitation is the uncertainty about which feature selection method to choose from among the wide array of available options. In this study, an ensemble feature selection strategy, namely the Markov Chain framework, was suggested to seek an optimal factor subset from filter-based factor selection results. To achieve this objective, 21 landslide conditioning factors were initially considered and seven well-known filter-based feature selection techniques were utilized to determine the factor importance scores. The proposed ensemble approach produced an optimal feature subset consisting of seven conditioning factors using a scree plot analysis after eliminating 14 factors (i.e., reduced by about 66%). The random forest (RF) algorithm was then utilized for predicting the landslide susceptibility by using both the optimal factor subset and all factors. The validation results indicated that overall accuracy (OA) and area under curve (AUC) obtained by using the optimal subset were computed as 90.983% and 94.561%, respectively. The RF algorithm fed by the optimal subset outperformed the scenario in which the whole dataset was used by more than 6% in terms of AUC. The performance differences were also confirmed by McNemar's test, and thus statistical differences for all cases were ascertained.
ISSN:2255-8950
2255-8942
2255-8950
DOI:10.22364/bjmc.2022.10.2.09