Learning from crowds with robust support vector machines
Crowdsourcing system provides an easy way to obtain labeled training data. However, the labels provided by non-expert labelers often appear low quality. So in practice, each sample usually obtains a multiple label set from multiple different labelers. Learning-from-crowds (LFC) aims to design ground...
Gespeichert in:
Veröffentlicht in: | Science China. Information sciences 2023-03, Vol.66 (3), p.132103, Article 132103 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Crowdsourcing system provides an easy way to obtain labeled training data. However, the labels provided by non-expert labelers often appear low quality. So in practice, each sample usually obtains a multiple label set from multiple different labelers. Learning-from-crowds (LFC) aims to design ground truth inference algorithms to infer the unknown true labels of data from multiple label sets. Despite their proper statistical foundations, the existing ground truth inference algorithms show limited performance when the number of labelers is small. However, more labelers mean higher costs. This paper tries to propose a novel ground truth inference algorithm which can maintain moderate performance and simultaneously reduce labeling costs. This paper addresses LFC from a point of view of robust classifiers and presents a new label noise robust support vector machine inference (RSVMI) algorithm. We prove that only one convex quadratic programming problem needs to be solved to build a robust support vector machine. Furthermore, in order to apply the robust support vector machine to crowdsourced data, two methods are proposed to estimate the noise level of integrated labels. By transforming the original LFC problem into a robust classifier learning problem, our algorithm shows good performance when the number of labelers is very small. In our experiments, the minimum number of labelers is set to 3. In terms of both label quality and model quality, the experimental results on benchmark data sets and real-world data sets show the effectiveness of RSVMI. |
---|---|
ISSN: | 1674-733X 1869-1919 |
DOI: | 10.1007/s11432-020-3067-8 |