Ensemble learning-based crop yield estimation: a scalable approach for supporting agricultural statistics

Detailed and accurate statistics on crop productivity are key to inform decision-making related to sustainable food production and supply ensuring global food security. However, annual and high-resolution crop yield data provided by official agricultural statistics are generally lacking. Earth obser...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:GIScience and remote sensing 2024-12, Vol.61 (1)
Hauptverfasser: Brandt, Patric, Beyer, Florian, Borrmann, Peter, Möller, Markus, Gerighausen, Heike
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Detailed and accurate statistics on crop productivity are key to inform decision-making related to sustainable food production and supply ensuring global food security. However, annual and high-resolution crop yield data provided by official agricultural statistics are generally lacking. Earth observation (EO) imagery, geodata on meteorological and soil conditions, as well as advances in machine learning (ML) provide huge opportunities for model-based crop yield estimation in terms of covering large spatial scales with unprecedented granularity. This study proposes a novel yield estimation approach that is bottom-up scalable from parcel to administrative levels by leveraging ML-ensembles, comprising of six regression estimators (base estimators), and multi-source geodata, including EO imagery. To ensure the approach's robustness, two ensemble learning techniques are investigated, namely meta-learning through model stacking and majority voting. ML-ensembles were evaluated multi-annually and crop-specifically for three major winter crops, namely winter wheat (WW), winter barley (WB), and winter rapeseed (WR) in two German federal states, covering 140,000 to 155,000 parcels per year. ML-ensembles were evaluated at the parcel and district level for two German federal states against official yield reports, ranging from 2019 to 2022, based on metrics such as coefficient of determination ( $RSQ$ RSQ ) and normalized root mean square error ( $nRMSE$ nRMSE ). Overall, the most robustly performing ensemble learning technique was majority voting yielding $RSQ$ RSQ and $nRMSE$ nRMSE values of 0.74, 13.4% for WW, 0.68, 16.9% for WB, and 0.66, 14.1% for WR, respectively, through cross-validation at parcel level. At the district level, majority voting reached $RSQ$ RSQ and $nRMSE$ nRMSE ranges of 0.79-0.89, 7.2-8.1% for WW, 0.80-0.84, 6.0-9.9% for WB, and 0.60-0.78, 6.1-10.4% for WR, respectively. Capitalizing on ensemble learning-based majority voting, examples of unprecedented high-resolution crop yield maps at $\textstyle1\times1\;\rm km$ 1 × 1 km spatial resolution are presented. Implementing a scalable yield estimation approach, as proposed in this study, into crop yield reporting frameworks of public authorities mandated to provide official agricultural statistics would increase the spatial resolution of annually reported yields, eventually covering the entire cropland available. Such unprecedented data products delivered through map services may improve decision-m
ISSN:1548-1603
1943-7226
DOI:10.1080/15481603.2024.2367808