Probably certifiably correct k-means clustering
Recently, Bandeira (C R Math, 2015 ) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k -means clustering...
Gespeichert in:
Veröffentlicht in: | Mathematical programming 2017-10, Vol.165 (2), p.605-642 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, Bandeira (C R Math,
2015
) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of
k
-means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of
k
-means Peng and Wei (SIAM J Optim 18(1):186–205,
2007
) is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed
k
-means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei (SIAM J Optim 18(1):186–205,
2007
) that is designed to solve
k
-means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model. |
---|---|
ISSN: | 0025-5610 1436-4646 |
DOI: | 10.1007/s10107-016-1097-0 |