Performance of linear mixed models and random forests for spatial prediction of soil pH

•Legacy data were used to map spatial variability of soil pH.•We compared performance of REML-EBLUP and random forest in prediction of soil pH.•We addressed how to validate models robustly from highly clustered legacy data.•REML-EBLUP (ordinary kriging in this case) performed better than the other m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Geoderma 2021-09, Vol.397, p.115079, Article 115079
Hauptverfasser: Makungwe, Mirriam, Chabala, Lydia Mumbi, Chishala, Benson H., Lark, R. Murray
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Legacy data were used to map spatial variability of soil pH.•We compared performance of REML-EBLUP and random forest in prediction of soil pH.•We addressed how to validate models robustly from highly clustered legacy data.•REML-EBLUP (ordinary kriging in this case) performed better than the other methods. Digital soil maps describe the spatial variation of soil and provide important information on spatial variation of soil properties which provides policy makers with a synoptic view of the state of the soil. This paper presents a study to tackle the task of how to map the spatial variation of soil pH across Zambia. This was part of a project to assess suitability for rice production across the country. Legacy data on the target variable were available along with additional exhaustive environmental covariates as potential predictor variables. We had the option of undertaking spatial prediction by geostatistical or machine learning methods. We set out to compare the approaches from the selection of predictor variables through to model validation, and to test the predictors on a set of validation observations. We also addressed the problem of how to robustly validate models from legacy data when these have, as is often the case, a strongly clustered spatial distribution. The validation statistics results showed that the empirical best linear unbiased predictor (EBLUP) with the only fixed effect a constant mean (ordinary kriging) performed better than the other methods. Random forests had the largest model-based estimates of the expected squared errors. We also noticed that the random forest algorithm was prone to select as “important” spatially correlated random variables which we had simulated.
ISSN:0016-7061
1872-6259
DOI:10.1016/j.geoderma.2021.115079