FARMSAR: Fixing AgRicultural Mislabels Using Sentinel-1 Time Series and AutoencodeRs
This paper aims to quantify the errors in the provided agricultural crop types, estimate the possible error rate in the available dataset, and propose a correction strategy. This quantification could establish a confidence criterion useful for decisions taken on this data or to have a better apprehe...
Gespeichert in:
Veröffentlicht in: | Remote sensing (Basel, Switzerland) Switzerland), 2023-01, Vol.15 (1), p.35 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper aims to quantify the errors in the provided agricultural crop types, estimate the possible error rate in the available dataset, and propose a correction strategy. This quantification could establish a confidence criterion useful for decisions taken on this data or to have a better apprehension of the possible consequences of using this data in learning downstream functions such as classification. We consider two agricultural label errors: crop type mislabels and mis-split crops. To process and correct these errors, we design a two-step methodology. Using class-specific convolutional autoencoders applied to synthetic aperture radar (SAR) time series of free-to-use and temporally dense Sentinel-1 data, we detect out-of-distribution temporal profiles of crop time series, which we categorize as one out of the three following possibilities: crop edge confusion, incorrectly split crop areas, and potentially mislabeled crop. We then relabel crops flagged as mislabeled using an Otsu threshold-derived confidence criteria. We numerically validate our methodology using a controlled disruption of labels over crops of confidence. We then compare our methods to supervised algorithms and show improved quality of relabels, with up to 98% correct relabels for our method, against up to 91% for Random Forest-based approaches. We show a drastic decrease in the performance of supervised algorithms under critical conditions (smaller and larger amounts of introduced label errors), with Random Forest falling to 56% of correct relabels against 95% for our approach. We also explicit the trade-off made in the design of our method between the number of relabels, and their quality. In addition, we apply this methodology to a set of agricultural labels containing probable mislabels. We also validate the quality of the corrections using optical imagery, which helps highlight incorrectly cut crops and potential mislabels. We then assess the applicability of the proposed method in various contexts and scales and present how it is suitable for verifying and correcting farmers’ crop declarations. |
---|---|
ISSN: | 2072-4292 2072-4292 |
DOI: | 10.3390/rs15010035 |