On the Relationship between Data Efficiency and Error for Uncertainty Sampling
While active learning offers potential cost savings, the actual data efficiency---the reduction in amount of labeled data needed to obtain the same error rate---observed in practice is mixed. This paper poses a basic question: when is active learning actually helpful? We provide an answer for logist...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While active learning offers potential cost savings, the actual data
efficiency---the reduction in amount of labeled data needed to obtain the same
error rate---observed in practice is mixed. This paper poses a basic question:
when is active learning actually helpful? We provide an answer for logistic
regression with the popular active learning algorithm, uncertainty sampling.
Empirically, on 21 datasets from OpenML, we find a strong inverse correlation
between data efficiency and the error rate of the final classifier.
Theoretically, we show that for a variant of uncertainty sampling, the
asymptotic data efficiency is within a constant factor of the inverse error
rate of the limiting classifier. |
---|---|
DOI: | 10.48550/arxiv.1806.06123 |