A framework for optimizing environmental covariates to support model interpretability in digital soil mapping

•Effective digital soil mapping can thrive with minimal covariates.•Principal components may not outperform variance inflation factor in modeling.•Quantile Regression Post-Processing assesses uncertainty regardless of model.•Variance inflation factor and recursive feature elimination ensure model pa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Geoderma 2024-05, Vol.445, p.116873, Article 116873
Hauptverfasser: Kasraei, Babak, Schmidt, Margaret G., Zhang, Jin, Bulmer, Chuck E., Filatow, Deepa S., Arbor, Adrienne, Pennell, Travis, Heung, Brandon
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Effective digital soil mapping can thrive with minimal covariates.•Principal components may not outperform variance inflation factor in modeling.•Quantile Regression Post-Processing assesses uncertainty regardless of model.•Variance inflation factor and recursive feature elimination ensure model parsimony. A common practice in digital soil mapping (DSM) is to incorporate many environmental covariates into a machine-learning algorithm to predict the spatial patterns of soil attributes. Variance inflation factor (VIF), principal component analysis (PCA), and recursive feature elimination (RFE) are three statistical methods that can be used to reduce the number of covariates. This study aims 1) to compare VIF and PCA approaches; 2) to identify an approach to determine the minimum number of covariates in DSM to ensure model parsimony using RFE after using VIF; and 3) to examine methods to interpret the impact of covariates on the variability of the predicted soil properties. The study area was the province of British Columbia (BC), Canada. This study used legacy data for four soil properties to make digital soil maps: soil organic carbon (SOC%), pH, clay%, and coarse fragment (CF%). Seven models were made for each soil property to determine the influence on validation results by using a different number of covariates produced by various methods on validation results. The results showed that the number of covariates could be reduced from 70 to 4 to 12 with only a little or no difference in concordance correlation coefficient (CCC) validation results. The CCC results of pH models using 70 and 7 covariates were both 0.74, and for other soil properties, this difference was negligible. The validation results obtained from PCA models showed that the performance of PCA in reducing the number of covariates was not as effective as when using VIF. Moreover, this study showed that covariates related to precipitation were the most important for modeling SOC%, soil pH, and clay%. Topographic covariates were the most influential covariates for modeling soil CF%. This study emphasizes the potential benefits of combining various data reduction methods to achieve optimal outcomes and generate the most parsimonious and interpretable models.
ISSN:0016-7061
1872-6259
DOI:10.1016/j.geoderma.2024.116873