Improving the predictions of soil properties from VNIR–SWIR spectra in an unlabeled region using semi-supervised and active learning

•The semi-supervised learning framework of LapSVR is presented.•LapSVR outperforms other supervised techniques in VNIR-SWIR spectroscopy.•The active learning framework is presented along with a novel strategy.•Active learning identifies samples from the unknown region that ought to be labeled.•The p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Geoderma 2021-04, Vol.387, p.114830, Article 114830
Hauptverfasser: Tsakiridis, Nikolaos L., Theocharis, John B., Symeonidis, Andreas L., Zalidis, George C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•The semi-supervised learning framework of LapSVR is presented.•LapSVR outperforms other supervised techniques in VNIR-SWIR spectroscopy.•The active learning framework is presented along with a novel strategy.•Active learning identifies samples from the unknown region that ought to be labeled.•The proposed methodologies statistically outperform their counterparts. Monitoring the status of the soil ecosystem to identify the spatio-temporal extent of the pressures exerted and mitigate the effects of climate change and land degradation necessitates the need for reliable and cost-effective solutions. To address this need, soil spectroscopy in the visible, near- and shortwave-infrared (VNIR–SWIR) has emerged as a viable alternative to traditional analytical approaches. To this end, large-scale soil spectral libraries coupled with advanced machine learning tools have been developed to infer the soil properties from the hyperspectral signatures. However, models developed from one region may exhibit diminished performance when applied to a new, unseen by the model, region due to the large and inherent soil variability (e.g. pedogenetical differences, diverse soil types etc.). Given an existing spectral library with labeled data and a new unlabeled region (i.e. where no soil samples are analytically measured) the question then becomes how to best develop a model which can more accurately predict the soil properties of the unlabeled region. In this paper, a machine learning technique leveraging on the capabilities of semi-supervised learning which exploits the predictors’ distribution of the unlabeled dataset and of active learning which expertly selects a small set of data from the unlabeled dataset as a spiking subset in order to develop a more robust model is proposed. The semi-supervised learning approach is the Laplacian Support Vector Regression following the manifold regularization framework. As far as the active learning component is concerned, the pool-based approach is utilized as it best matches with the aforementioned use-case scenario, which iteratively selects a subset of data from the unlabeled region to spike the calibration set. As a query strategy, a novel machine learning–based strategy is proposed herein to best identify the spiking subset at each iteration. The experimental analysis was conducted using data from the Land Use and Coverage Area Frame Survey of 2009 which covered most of the then member-states of the European Union, and in particul
ISSN:0016-7061
1872-6259
DOI:10.1016/j.geoderma.2020.114830