Probably certifiably correct k-means clustering

Recently, Bandeira (C R Math, 2015 ) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k -means clustering...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Mathematical programming 2017-10, Vol.165 (2), p.605-642
Hauptverfasser: Iguchi, Takayuki, Mixon, Dustin G., Peterson, Jesse, Villar, Soledad
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 642
container_issue 2
container_start_page 605
container_title Mathematical programming
container_volume 165
creator Iguchi, Takayuki
Mixon, Dustin G.
Peterson, Jesse
Villar, Soledad
description Recently, Bandeira (C R Math, 2015 ) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k -means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of k -means Peng and Wei (SIAM J Optim 18(1):186–205, 2007 ) is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed k -means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei (SIAM J Optim 18(1):186–205, 2007 ) that is designed to solve k -means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model.
doi_str_mv 10.1007/s10107-016-1097-0
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1944763400</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1944763400</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-cd5399a6c0bae40dccf3853e5fce67a19672b813db55d91fe38b25b1be1892be3</originalsourceid><addsrcrecordid>eNp1kD1PwzAQhi0EEqHwA9giMZvexR-JR1QBRaoEA8yW7VxQStsEOx3670kUBhame4f3Q_cwdotwjwDlMiEglBxQcwQzijOWoRSaSy31OcsACsWVRrhkVyltAQBFVWVs-RY77_zulAeKQ9u0s-5ipDDkX3xP7pDysDumgWJ7-LxmF43bJbr5vQv28fT4vlrzzevzy-phw4NAPfBQK2GM0wG8Iwl1CI2olCDVBNKlQ6PLwlcoaq9UbbAhUflCefSElSk8iQW7m3v72H0fKQ122x3jYZy0aKQstZAAowtnV4hdSpEa28d27-LJItiJi5252JGLnbjYKVPMmdRPD1H80_xv6AcoumTy</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1944763400</pqid></control><display><type>article</type><title>Probably certifiably correct k-means clustering</title><source>SpringerLink Journals</source><source>Business Source Complete</source><creator>Iguchi, Takayuki ; Mixon, Dustin G. ; Peterson, Jesse ; Villar, Soledad</creator><creatorcontrib>Iguchi, Takayuki ; Mixon, Dustin G. ; Peterson, Jesse ; Villar, Soledad</creatorcontrib><description>Recently, Bandeira (C R Math, 2015 ) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k -means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of k -means Peng and Wei (SIAM J Optim 18(1):186–205, 2007 ) is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed k -means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei (SIAM J Optim 18(1):186–205, 2007 ) that is designed to solve k -means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model.</description><identifier>ISSN: 0025-5610</identifier><identifier>EISSN: 1436-4646</identifier><identifier>DOI: 10.1007/s10107-016-1097-0</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Calculus of Variations and Optimal Control; Optimization ; Cluster analysis ; Clustering ; Clusters ; Combinatorics ; Full Length Paper ; Mathematical and Computational Physics ; Mathematical Methods in Physics ; Mathematics ; Mathematics and Statistics ; Mathematics of Computing ; Numerical Analysis ; Solvers ; Theoretical ; Vector quantization</subject><ispartof>Mathematical programming, 2017-10, Vol.165 (2), p.605-642</ispartof><rights>Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society (outside the USA) 2016</rights><rights>Mathematical Programming is a copyright of Springer, 2017.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-cd5399a6c0bae40dccf3853e5fce67a19672b813db55d91fe38b25b1be1892be3</citedby><cites>FETCH-LOGICAL-c316t-cd5399a6c0bae40dccf3853e5fce67a19672b813db55d91fe38b25b1be1892be3</cites><orcidid>0000-0003-2743-7010</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10107-016-1097-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10107-016-1097-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Iguchi, Takayuki</creatorcontrib><creatorcontrib>Mixon, Dustin G.</creatorcontrib><creatorcontrib>Peterson, Jesse</creatorcontrib><creatorcontrib>Villar, Soledad</creatorcontrib><title>Probably certifiably correct k-means clustering</title><title>Mathematical programming</title><addtitle>Math. Program</addtitle><description>Recently, Bandeira (C R Math, 2015 ) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k -means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of k -means Peng and Wei (SIAM J Optim 18(1):186–205, 2007 ) is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed k -means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei (SIAM J Optim 18(1):186–205, 2007 ) that is designed to solve k -means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model.</description><subject>Algorithms</subject><subject>Calculus of Variations and Optimal Control; Optimization</subject><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Clusters</subject><subject>Combinatorics</subject><subject>Full Length Paper</subject><subject>Mathematical and Computational Physics</subject><subject>Mathematical Methods in Physics</subject><subject>Mathematics</subject><subject>Mathematics and Statistics</subject><subject>Mathematics of Computing</subject><subject>Numerical Analysis</subject><subject>Solvers</subject><subject>Theoretical</subject><subject>Vector quantization</subject><issn>0025-5610</issn><issn>1436-4646</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp1kD1PwzAQhi0EEqHwA9giMZvexR-JR1QBRaoEA8yW7VxQStsEOx3670kUBhame4f3Q_cwdotwjwDlMiEglBxQcwQzijOWoRSaSy31OcsACsWVRrhkVyltAQBFVWVs-RY77_zulAeKQ9u0s-5ipDDkX3xP7pDysDumgWJ7-LxmF43bJbr5vQv28fT4vlrzzevzy-phw4NAPfBQK2GM0wG8Iwl1CI2olCDVBNKlQ6PLwlcoaq9UbbAhUflCefSElSk8iQW7m3v72H0fKQ122x3jYZy0aKQstZAAowtnV4hdSpEa28d27-LJItiJi5252JGLnbjYKVPMmdRPD1H80_xv6AcoumTy</recordid><startdate>20171001</startdate><enddate>20171001</enddate><creator>Iguchi, Takayuki</creator><creator>Mixon, Dustin G.</creator><creator>Peterson, Jesse</creator><creator>Villar, Soledad</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2P</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0003-2743-7010</orcidid></search><sort><creationdate>20171001</creationdate><title>Probably certifiably correct k-means clustering</title><author>Iguchi, Takayuki ; Mixon, Dustin G. ; Peterson, Jesse ; Villar, Soledad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-cd5399a6c0bae40dccf3853e5fce67a19672b813db55d91fe38b25b1be1892be3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Calculus of Variations and Optimal Control; Optimization</topic><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Clusters</topic><topic>Combinatorics</topic><topic>Full Length Paper</topic><topic>Mathematical and Computational Physics</topic><topic>Mathematical Methods in Physics</topic><topic>Mathematics</topic><topic>Mathematics and Statistics</topic><topic>Mathematics of Computing</topic><topic>Numerical Analysis</topic><topic>Solvers</topic><topic>Theoretical</topic><topic>Vector quantization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Iguchi, Takayuki</creatorcontrib><creatorcontrib>Mixon, Dustin G.</creatorcontrib><creatorcontrib>Peterson, Jesse</creatorcontrib><creatorcontrib>Villar, Soledad</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Mathematical programming</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Iguchi, Takayuki</au><au>Mixon, Dustin G.</au><au>Peterson, Jesse</au><au>Villar, Soledad</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Probably certifiably correct k-means clustering</atitle><jtitle>Mathematical programming</jtitle><stitle>Math. Program</stitle><date>2017-10-01</date><risdate>2017</risdate><volume>165</volume><issue>2</issue><spage>605</spage><epage>642</epage><pages>605-642</pages><issn>0025-5610</issn><eissn>1436-4646</eissn><abstract>Recently, Bandeira (C R Math, 2015 ) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k -means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of k -means Peng and Wei (SIAM J Optim 18(1):186–205, 2007 ) is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed k -means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei (SIAM J Optim 18(1):186–205, 2007 ) that is designed to solve k -means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s10107-016-1097-0</doi><tpages>38</tpages><orcidid>https://orcid.org/0000-0003-2743-7010</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0025-5610
ispartof Mathematical programming, 2017-10, Vol.165 (2), p.605-642
issn 0025-5610
1436-4646
language eng
recordid cdi_proquest_journals_1944763400
source SpringerLink Journals; Business Source Complete
subjects Algorithms
Calculus of Variations and Optimal Control
Optimization
Cluster analysis
Clustering
Clusters
Combinatorics
Full Length Paper
Mathematical and Computational Physics
Mathematical Methods in Physics
Mathematics
Mathematics and Statistics
Mathematics of Computing
Numerical Analysis
Solvers
Theoretical
Vector quantization
title Probably certifiably correct k-means clustering
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T08%3A22%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Probably%20certifiably%20correct%20k-means%20clustering&rft.jtitle=Mathematical%20programming&rft.au=Iguchi,%20Takayuki&rft.date=2017-10-01&rft.volume=165&rft.issue=2&rft.spage=605&rft.epage=642&rft.pages=605-642&rft.issn=0025-5610&rft.eissn=1436-4646&rft_id=info:doi/10.1007/s10107-016-1097-0&rft_dat=%3Cproquest_cross%3E1944763400%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1944763400&rft_id=info:pmid/&rfr_iscdi=true