Fast Randomized Semi-Supervised Clustering

We consider the problem of clustering partially labeled data from a minimal number of randomly chosen pairwise comparisons between the items. We introduce an efficient local algorithm based on a power iteration of the non-backtracking operator and study its performance on a generative model. For the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of physics. Conference series 2018-06, Vol.1036 (1), p.12015
Hauptverfasser: Saade, Alaa, Krzakala, Florent, Lelarge, Marc, Zdeborová, Lenka
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We consider the problem of clustering partially labeled data from a minimal number of randomly chosen pairwise comparisons between the items. We introduce an efficient local algorithm based on a power iteration of the non-backtracking operator and study its performance on a generative model. For the case of two clusters, we give bounds on the classification error and show that a small error can be achieved from O(n) randomly chosen measurements, where n is the number of items in the dataset. Our algorithm is therefore efficient both in terms of time and space complexities. We also investigate numerically the performance of the algorithm on synthetic and real-world data.
ISSN:1742-6588
1742-6596
1742-6596
1742-6588
DOI:10.1088/1742-6596/1036/1/012015