Comparison of feature selection methods for mapping soil organic matter in subtropical restored forests

•Filter, wrapper, embedded and ensemble feature selection approaches (FS) were evaluated.•The proposed ensemble FS incorporates ten individual selectors to identify top significant variables.•Ensemble FS produced the highest predictive variable subset for mapping SOM.•XGBoost outperformed RF in mapp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Ecological indicators 2022-02, Vol.135, p.108545, Article 108545
Hauptverfasser: Chen, Yang, Ma, Lixia, Yu, Dongsheng, Zhang, Haidong, Feng, Kaiyue, Wang, Xin, Song, Jie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Filter, wrapper, embedded and ensemble feature selection approaches (FS) were evaluated.•The proposed ensemble FS incorporates ten individual selectors to identify top significant variables.•Ensemble FS produced the highest predictive variable subset for mapping SOM.•XGBoost outperformed RF in mapping SOM in subtropical restored forests. Mapping Soil organic matter (SOM) over a complex forest landscape is challenging due to the difficulty in selecting the most insightful variables from high-dimensional datasets in the recent explosion of geospatial-data. Feature selection (FS) is necessary to reduce data redundancy and noise as well as to achieve more reliable SOM spatial predictions. However, it is still unclear that which is most effective among various FS methods in mapping SOM. Therefore, four types of FS approaches (i.e., filter, wrapper, embedded and ensemble) were adopted to generate optimum variable subsets from an original variable dataset of 60 candidates, respectively, for mapping SOM of restored forest land in a typical subtropical region of southern China. The most used methods for each type of FS approaches were selected in this study, including three filters (Chi-square, InfoGain and pearson correlation analysis), three wrappers (genetic algorithm, simulated annealing algorithm and support vector machine-recursive feature elimination) and three embedded methods (Boruta, random forest (RF) and extreme gradient boosting (XGBoost)), as well as an ensemble method (robust rank aggreg algorithm (RRA)). Meanwhile, the RF and XGBoost models were applied with a 10-fold cross-validation method to compare the relative advantages of the different FS methods in SOM mapping, by utilizing the correlation coefficients R2 between observed and predicted values and predicting errors of root mean square error (RMSE). The results show that the SOM prediction accuracies with optimized variable subsets generated by the different FS methods are better than those with full variables, yet the improvements of prediction performance are different among the four types of FS approaches. The ensemble method (RRA) is superior to the other three types of approaches with an average RMSE reduction of 9.16% comparing to that without using FS methods, followed by wrapper and embedded methods which obtained the average RMSE reduction by 7.81%, 7.32%, respectively, and the filter methods are the weakest in the RMSE reduction with slight decreases of 4.32%. The XGBoost model achi
ISSN:1470-160X
1872-7034
DOI:10.1016/j.ecolind.2022.108545