Sketch-and-solve approaches to k-means clustering by semidefinite programming

We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value. This lowe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Clum, Charles, Mixon, Dustin G, Villar, Soledad, Xie, Kaiying
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Data Structures and Algorithms Computer Science - Information Theory Computer Science - Learning Mathematics - Information Theory Mathematics - Optimization and Control Mathematics - Statistics Theory Statistics - Machine Learning Statistics - Theory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value. This lower bound is data-driven; it does not make any assumption on the data nor how it is generated. We provide code and an extensive set of numerical experiments where we use this approach to certify approximate optimality of clustering solutions obtained by k-means++.
DOI:	10.48550/arxiv.2211.15744