Active Learning With Sampling by Uncertainty and Density for Data Annotations

To solve the knowledge bottleneck problem, active learning has been widely used for its ability to automatically select the most informative unlabeled examples for human annotation. One of the key enabling techniques of active learning is uncertainty sampling, which uses one classifier to identify u...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2010-08, Vol.18 (6), p.1323-1331
Hauptverfasser:	Zhu, J, Wang, H, Tsou, B K, Ma, M
Format:	Artikel
Sprache:	eng
Schlagworte:	Active learning Applied sciences density-based re-ranking Exact sciences and technology Humans Information, signal and communications theory Large-scale systems Machine learning Natural language processing Natural languages sampling by uncertainty and density Sampling methods Sampling, quantization Signal and communications theory Signal representation. Spectral analysis Signal, noise Supervised learning Telecommunications and information theory Text categorization text classification Training data Uncertainty uncertainty sampling word sense disambiguation (WSD)
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	To solve the knowledge bottleneck problem, active learning has been widely used for its ability to automatically select the most informative unlabeled examples for human annotation. One of the key enabling techniques of active learning is uncertainty sampling, which uses one classifier to identify unlabeled examples with the least confidence. Uncertainty sampling often presents problems when outliers are selected. To solve the outlier problem, this paper presents two techniques, sampling by uncertainty and density (SUD) and density-based re-ranking . Both techniques prefer not only the most informative example in terms of uncertainty criterion, but also the most representative example in terms of density criterion. Experimental results of active learning for word sense disambiguation and text classification tasks using six real-world evaluation data sets demonstrate the effectiveness of the proposed methods.
ISSN:	1558-7916 2329-9290 1558-7924 2329-9304
DOI:	10.1109/TASL.2009.2033421