Is margin all you need? An extensive empirical study of active learning on tabular data
Given a labeled training set and a collection of unlabeled data, the goal of active learning (AL) is to identify the best unlabeled points to label. In this comprehensive study, we analyze the performance of a variety of AL algorithms on deep neural networks trained on 69 real-world tabular classifi...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Given a labeled training set and a collection of unlabeled data, the goal of
active learning (AL) is to identify the best unlabeled points to label. In this
comprehensive study, we analyze the performance of a variety of AL algorithms
on deep neural networks trained on 69 real-world tabular classification
datasets from the OpenML-CC18 benchmark. We consider different data regimes and
the effect of self-supervised model pre-training. Surprisingly, we find that
the classical margin sampling technique matches or outperforms all others,
including current state-of-art, in a wide range of experimental settings. To
researchers, we hope to encourage rigorous benchmarking against margin, and to
practitioners facing tabular data labeling constraints that
hyper-parameter-free margin may often be all they need. |
---|---|
DOI: | 10.48550/arxiv.2210.03822 |