Influence of sample size, model selection, and land use on prediction accuracy of soil properties

Digital soil mapping (DSM) uses models that integrate field and laboratory data with environmental factors to predict soils and soil properties. The accuracy of predictions depends on the models used, the data collected, and the environmental factors. This study assesses the influence of sampling de...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Geoderma Regional 2024-03, Vol.36, p.e00766, Article e00766
Hauptverfasser: Safaee, Samira, Libohova, Zamir, Kladivko, Eileen J., Brown, Andrew, Winzeler, Edwin, Read, Quentin, Rahmani, Shams, Adhikari, Kabindra
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Digital soil mapping (DSM) uses models that integrate field and laboratory data with environmental factors to predict soils and soil properties. The accuracy of predictions depends on the models used, the data collected, and the environmental factors. This study assesses the influence of sampling density and distribution, covariates, and modeling approach on the prediction accuracy of soil organic matter (SOM) and cation exchange capacity (CEC) at three sites in Indiana (ACRE; DPAC; SEPAC) with different management intensity and sampling designs. Ordinary Kriging (OK) and three machine learning models Cubist (CB), Random Forest (RF), and Regression Kriging (RK) were used. The Coefficient of Determination (R2), Root Mean Square Error (RMSE), Mean Square Error (MSE), concordance coefficient (pc), and bias were used for the accuracy assessment. The accuracy of the predictions was influenced by the site, sample density, model type, soil property, and their interactions. Sites were the single largest source of significant variation followed by sampling density and model type for both SOM and CEC. ACRE, with multiple fields and complex management practices, had a higher average RMSE and wider range of RMSE for SOM compared to SEPAC and DPAC with uniform management. At ACRE the RMSE for SOM decreased from 2.75 to 0.85 and from 17.38 to 3.61 for CEC with increasing number of samples from 36 (6 points/ha) to 66 (12points/ha), but did not change with further increases up to 146 samples. At SEPAC and DPAC the RMSE decreased only slightly at sampling densities above 5 points/ha and 1–2 points/ha, respectively (68 and 43 samples, respectively). Based on cross validation, all models performed poorly for SOM with R2 varying from 0.13 to 0.38, while for CEC the model performance varied widely from 0.11 to 0.64. The accuracy predictions for CEC were higher compared to SOM at all sites. Overall, RF performed better while OK performed the worst for both SOM and CEC. The mean R2 values across all sites were 0.35 (SOM) and 0.51 (CEC) for RF and 0.19 (SOM) and 0.17 (CEC) for OK. At ACRE, OK performed worse for both SOM and CEC with only slight differences among the other models, while at SEPAC and DPAC there were only slight differences among all models. Spatial predictions for CB, RF and RK were more detailed and conformed to soil landscape models compared to OK. The spatial differences between sampling densities for predicted SOM and CEC were greater in lower elevation areas
ISSN:2352-0094
2352-0094
DOI:10.1016/j.geodrs.2024.e00766