Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing
Crowdsourcing has emerged as an effective platform for labeling large amounts of data in a cost- and time-efficient manner. Most previous work has focused on designing an efficient algorithm to recover only the ground-truth labels of the data. In this paper, we consider multi-choice crowdsourcing ta...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Crowdsourcing has emerged as an effective platform for labeling large amounts
of data in a cost- and time-efficient manner. Most previous work has focused on
designing an efficient algorithm to recover only the ground-truth labels of the
data. In this paper, we consider multi-choice crowdsourcing tasks with the goal
of recovering not only the ground truth, but also the most confusing answer and
the confusion probability. The most confusing answer provides useful
information about the task by revealing the most plausible answer other than
the ground truth and how plausible it is. To theoretically analyze such
scenarios, we propose a model in which there are the top two plausible answers
for each task, distinguished from the rest of the choices. Task difficulty is
quantified by the probability of confusion between the top two, and worker
reliability is quantified by the probability of giving an answer among the top
two. Under this model, we propose a two-stage inference algorithm to infer both
the top two answers and the confusion probability. We show that our algorithm
achieves the minimax optimal convergence rate. We conduct both synthetic and
real data experiments and demonstrate that our algorithm outperforms other
recent algorithms. We also show the applicability of our algorithms in
inferring the difficulty of tasks and in training neural networks with top-two
soft labels. |
---|---|
DOI: | 10.48550/arxiv.2301.00006 |