Boosting Crowdsourced Annotation Accuracy: Small Loss Filtering and Augmentation-Driven Training

Crowdsourcing platforms provide an efficient and cost-effective means to acquire the extensive labeled data necessary for supervised learning. However, the labels provided by untrained crowdsourcing workers often contain a considerable amount of noise. Although the application of ground truth infere...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.101745-101755
Hauptverfasser: Fu, Yanming, Han, Weigeng, Yang, Jingsang, Lu, Haodong, Yu, Xin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Crowdsourcing platforms provide an efficient and cost-effective means to acquire the extensive labeled data necessary for supervised learning. However, the labels provided by untrained crowdsourcing workers often contain a considerable amount of noise. Although the application of ground truth inference algorithms to deduce integrated labels effectively enhances label quality, a certain level of noise persists. To further diminish the noise within crowdsourced labeling, this paper introduces a novel Small Loss-based Noise Correction algorithm (SLNC). SLNC first filters the crowdsourced data, leveraging the characteristic of neural networks to preferentially fits clean samples, thereby obtaining relatively clean and noisy sets. It then employs data augmentation techniques to enhance the clean set and subsequently trains the corrector on this augmented set to rectify the noisy set. SLNC has been evaluated using 16 simulated and two real-world datasets. The results indicate that SLNC surpasses comparative algorithms in the quality of the final labels.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3432729