Ensemble Conditioning Factor Selection with Markov Chain Framework for Shallow Landslide Susceptibility Mapping in Lake Sapanca Basin and its Vicinity, Turkey
The selection of landslide predisposing factors is usually permeated by a certain level of subjectivity, which sometimes adversely affects the performance of the established predictive models. Although filter-based feature selection algorithms have been extensively utilized for discarding the irrele...
Gespeichert in:
Veröffentlicht in: | Baltic Journal of Modern Computing 2022, Vol.10 (2), p.224-240 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The selection of landslide predisposing factors is usually permeated by a certain level of subjectivity, which sometimes adversely affects the performance of the established predictive models. Although filter-based feature selection algorithms have been extensively utilized for discarding the irrelevant factors from the geospatial database, they extremely suffer from statistical biases. Another important limitation is the uncertainty about which feature selection method to choose from among the wide array of available options. In this study, an ensemble feature selection strategy, namely the Markov Chain framework, was suggested to seek an optimal factor subset from filter-based factor selection results. To achieve this objective, 21 landslide conditioning factors were initially considered and seven well-known filter-based feature selection techniques were utilized to determine the factor importance scores. The proposed ensemble approach produced an optimal feature subset consisting of seven conditioning factors using a scree plot analysis after eliminating 14 factors (i.e., reduced by about 66%). The random forest (RF) algorithm was then utilized for predicting the landslide susceptibility by using both the optimal factor subset and all factors. The validation results indicated that overall accuracy (OA) and area under curve (AUC) obtained by using the optimal subset were computed as 90.983% and 94.561%, respectively. The RF algorithm fed by the optimal subset outperformed the scenario in which the whole dataset was used by more than 6% in terms of AUC. The performance differences were also confirmed by McNemar's test, and thus statistical differences for all cases were ascertained. |
---|---|
ISSN: | 2255-8950 2255-8942 2255-8950 |
DOI: | 10.22364/bjmc.2022.10.2.09 |