Learning from crowds with robust support vector machines

Crowdsourcing system provides an easy way to obtain labeled training data. However, the labels provided by non-expert labelers often appear low quality. So in practice, each sample usually obtains a multiple label set from multiple different labelers. Learning-from-crowds (LFC) aims to design ground...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Science China. Information sciences 2023-03, Vol.66 (3), p.132103, Article 132103
Hauptverfasser:	Yang, Wenjun, Li, Chaoqun, Jiang, Liangxiao
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Classifiers Computer Science Crowdsourcing Datasets Information Systems and Communication Service Labels Machine learning Noise levels Quadratic programming Research Paper Robustness Statistical inference Support vector machines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Crowdsourcing system provides an easy way to obtain labeled training data. However, the labels provided by non-expert labelers often appear low quality. So in practice, each sample usually obtains a multiple label set from multiple different labelers. Learning-from-crowds (LFC) aims to design ground truth inference algorithms to infer the unknown true labels of data from multiple label sets. Despite their proper statistical foundations, the existing ground truth inference algorithms show limited performance when the number of labelers is small. However, more labelers mean higher costs. This paper tries to propose a novel ground truth inference algorithm which can maintain moderate performance and simultaneously reduce labeling costs. This paper addresses LFC from a point of view of robust classifiers and presents a new label noise robust support vector machine inference (RSVMI) algorithm. We prove that only one convex quadratic programming problem needs to be solved to build a robust support vector machine. Furthermore, in order to apply the robust support vector machine to crowdsourced data, two methods are proposed to estimate the noise level of integrated labels. By transforming the original LFC problem into a robust classifier learning problem, our algorithm shows good performance when the number of labelers is very small. In our experiments, the minimum number of labelers is set to 3. In terms of both label quality and model quality, the experimental results on benchmark data sets and real-world data sets show the effectiveness of RSVMI.
ISSN:	1674-733X 1869-1919
DOI:	10.1007/s11432-020-3067-8