Active learning through density clustering

•We propose the active learning through density clustering algorithm with three new features.•We design a new importance measure to select representative instances deterministically.•We employ tri-partition to determine the action to be taken on each instance.•The new algorithm generally outperforms...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2017-11, Vol.85, p.305-317
Hauptverfasser: Wang, Min, Min, Fan, Zhang, Zhi-Heng, Wu, Yan-Xue
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We propose the active learning through density clustering algorithm with three new features.•We design a new importance measure to select representative instances deterministically.•We employ tri-partition to determine the action to be taken on each instance.•The new algorithm generally outperforms state-of-the-art active learning algorithms.•The new algorithm requires only O(n) of space and O(mn2) of time. Active learning is used for classification when labeling data are costly, while the main challenge is to identify the critical instances that should be labeled. Clustering-based approaches take advantage of the structure of the data to select representative instances. In this paper, we developed the active learning through density peak clustering (ALEC) algorithm with three new features. First, a master tree was built to express the relationships among the nodes and assist the growth of the cluster tree. Second, a deterministic instance selection strategy was designed using a new importance measure. Third, tri-partitioning was employed to determine the action to be taken on each instance during iterative clustering, labeling, and classifying. Experiments were performed with 14 datasets to compare against state-of-the-art active learning algorithms. Results demonstrated that the new algorithm had higher classification accuracy using the same number of labeled data.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2017.05.046