A Probabilistic Active Learning Algorithm Based on Fisher Information Ratio

The task of labeling samples is demanding and expensive. Active learning aims to generate the smallest possible training data set that results in a classifier with high performance in the test phase. It usually consists of two steps of selecting a set of queries and requesting their labels. Among th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2018-08, Vol.40 (8), p.2023-2029
Hauptverfasser:	Sourati, Jamshid, Akcakaya, Murat, Erdogmus, Deniz, Leen, Todd K., Dy, Jennifer G.
Format:	Artikel
Sprache:	eng
Schlagworte:	Active learning Algorithms Approximation algorithms Computational complexity Computer Simulation Databases, Factual - statistics & numerical data discriminative classification Finite impulse response filters Fisher information Humans Information theory Labels Machine learning Models, Statistical Monte Carlo Method Optimization Probabilistic logic probabilistic querying Proposals Queries Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The task of labeling samples is demanding and expensive. Active learning aims to generate the smallest possible training data set that results in a classifier with high performance in the test phase. It usually consists of two steps of selecting a set of queries and requesting their labels. Among the suggested objectives to score the query sets, information theoretic measures have become very popular. Yet among them, those based on Fisher information (FI) have the advantage of considering the diversity among the queries and tractable computations. In this work, we provide a practical algorithm based on Fisher information ratio to obtain query distribution for a general framework where, in contrast to the previous FI-based querying methods, we make no assumptions over the test distribution. The empirical results on synthetic and real-world data sets indicate that this algorithm gives competitive results.
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2017.2743707