Probably certifiably correct k-means clustering
Recently, Bandeira (C R Math, 2015 ) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k -means clustering...
Gespeichert in:
Veröffentlicht in: | Mathematical programming 2017-10, Vol.165 (2), p.605-642 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 642 |
---|---|
container_issue | 2 |
container_start_page | 605 |
container_title | Mathematical programming |
container_volume | 165 |
creator | Iguchi, Takayuki Mixon, Dustin G. Peterson, Jesse Villar, Soledad |
description | Recently, Bandeira (C R Math,
2015
) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of
k
-means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of
k
-means Peng and Wei (SIAM J Optim 18(1):186–205,
2007
) is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed
k
-means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei (SIAM J Optim 18(1):186–205,
2007
) that is designed to solve
k
-means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model. |
doi_str_mv | 10.1007/s10107-016-1097-0 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1944763400</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1944763400</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-cd5399a6c0bae40dccf3853e5fce67a19672b813db55d91fe38b25b1be1892be3</originalsourceid><addsrcrecordid>eNp1kD1PwzAQhi0EEqHwA9giMZvexR-JR1QBRaoEA8yW7VxQStsEOx3670kUBhame4f3Q_cwdotwjwDlMiEglBxQcwQzijOWoRSaSy31OcsACsWVRrhkVyltAQBFVWVs-RY77_zulAeKQ9u0s-5ipDDkX3xP7pDysDumgWJ7-LxmF43bJbr5vQv28fT4vlrzzevzy-phw4NAPfBQK2GM0wG8Iwl1CI2olCDVBNKlQ6PLwlcoaq9UbbAhUflCefSElSk8iQW7m3v72H0fKQ122x3jYZy0aKQstZAAowtnV4hdSpEa28d27-LJItiJi5252JGLnbjYKVPMmdRPD1H80_xv6AcoumTy</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1944763400</pqid></control><display><type>article</type><title>Probably certifiably correct k-means clustering</title><source>SpringerLink Journals</source><source>Business Source Complete</source><creator>Iguchi, Takayuki ; Mixon, Dustin G. ; Peterson, Jesse ; Villar, Soledad</creator><creatorcontrib>Iguchi, Takayuki ; Mixon, Dustin G. ; Peterson, Jesse ; Villar, Soledad</creatorcontrib><description>Recently, Bandeira (C R Math,
2015
) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of
k
-means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of
k
-means Peng and Wei (SIAM J Optim 18(1):186–205,
2007
) is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed
k
-means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei (SIAM J Optim 18(1):186–205,
2007
) that is designed to solve
k
-means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model.</description><identifier>ISSN: 0025-5610</identifier><identifier>EISSN: 1436-4646</identifier><identifier>DOI: 10.1007/s10107-016-1097-0</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Calculus of Variations and Optimal Control; Optimization ; Cluster analysis ; Clustering ; Clusters ; Combinatorics ; Full Length Paper ; Mathematical and Computational Physics ; Mathematical Methods in Physics ; Mathematics ; Mathematics and Statistics ; Mathematics of Computing ; Numerical Analysis ; Solvers ; Theoretical ; Vector quantization</subject><ispartof>Mathematical programming, 2017-10, Vol.165 (2), p.605-642</ispartof><rights>Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society (outside the USA) 2016</rights><rights>Mathematical Programming is a copyright of Springer, 2017.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-cd5399a6c0bae40dccf3853e5fce67a19672b813db55d91fe38b25b1be1892be3</citedby><cites>FETCH-LOGICAL-c316t-cd5399a6c0bae40dccf3853e5fce67a19672b813db55d91fe38b25b1be1892be3</cites><orcidid>0000-0003-2743-7010</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10107-016-1097-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10107-016-1097-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Iguchi, Takayuki</creatorcontrib><creatorcontrib>Mixon, Dustin G.</creatorcontrib><creatorcontrib>Peterson, Jesse</creatorcontrib><creatorcontrib>Villar, Soledad</creatorcontrib><title>Probably certifiably correct k-means clustering</title><title>Mathematical programming</title><addtitle>Math. Program</addtitle><description>Recently, Bandeira (C R Math,
2015
) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of
k
-means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of
k
-means Peng and Wei (SIAM J Optim 18(1):186–205,
2007
) is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed
k
-means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei (SIAM J Optim 18(1):186–205,
2007
) that is designed to solve
k
-means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model.</description><subject>Algorithms</subject><subject>Calculus of Variations and Optimal Control; Optimization</subject><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Clusters</subject><subject>Combinatorics</subject><subject>Full Length Paper</subject><subject>Mathematical and Computational Physics</subject><subject>Mathematical Methods in Physics</subject><subject>Mathematics</subject><subject>Mathematics and Statistics</subject><subject>Mathematics of Computing</subject><subject>Numerical Analysis</subject><subject>Solvers</subject><subject>Theoretical</subject><subject>Vector quantization</subject><issn>0025-5610</issn><issn>1436-4646</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp1kD1PwzAQhi0EEqHwA9giMZvexR-JR1QBRaoEA8yW7VxQStsEOx3670kUBhame4f3Q_cwdotwjwDlMiEglBxQcwQzijOWoRSaSy31OcsACsWVRrhkVyltAQBFVWVs-RY77_zulAeKQ9u0s-5ipDDkX3xP7pDysDumgWJ7-LxmF43bJbr5vQv28fT4vlrzzevzy-phw4NAPfBQK2GM0wG8Iwl1CI2olCDVBNKlQ6PLwlcoaq9UbbAhUflCefSElSk8iQW7m3v72H0fKQ122x3jYZy0aKQstZAAowtnV4hdSpEa28d27-LJItiJi5252JGLnbjYKVPMmdRPD1H80_xv6AcoumTy</recordid><startdate>20171001</startdate><enddate>20171001</enddate><creator>Iguchi, Takayuki</creator><creator>Mixon, Dustin G.</creator><creator>Peterson, Jesse</creator><creator>Villar, Soledad</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2P</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0003-2743-7010</orcidid></search><sort><creationdate>20171001</creationdate><title>Probably certifiably correct k-means clustering</title><author>Iguchi, Takayuki ; Mixon, Dustin G. ; Peterson, Jesse ; Villar, Soledad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-cd5399a6c0bae40dccf3853e5fce67a19672b813db55d91fe38b25b1be1892be3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Calculus of Variations and Optimal Control; Optimization</topic><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Clusters</topic><topic>Combinatorics</topic><topic>Full Length Paper</topic><topic>Mathematical and Computational Physics</topic><topic>Mathematical Methods in Physics</topic><topic>Mathematics</topic><topic>Mathematics and Statistics</topic><topic>Mathematics of Computing</topic><topic>Numerical Analysis</topic><topic>Solvers</topic><topic>Theoretical</topic><topic>Vector quantization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Iguchi, Takayuki</creatorcontrib><creatorcontrib>Mixon, Dustin G.</creatorcontrib><creatorcontrib>Peterson, Jesse</creatorcontrib><creatorcontrib>Villar, Soledad</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Mathematical programming</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Iguchi, Takayuki</au><au>Mixon, Dustin G.</au><au>Peterson, Jesse</au><au>Villar, Soledad</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Probably certifiably correct k-means clustering</atitle><jtitle>Mathematical programming</jtitle><stitle>Math. Program</stitle><date>2017-10-01</date><risdate>2017</risdate><volume>165</volume><issue>2</issue><spage>605</spage><epage>642</epage><pages>605-642</pages><issn>0025-5610</issn><eissn>1436-4646</eissn><abstract>Recently, Bandeira (C R Math,
2015
) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of
k
-means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of
k
-means Peng and Wei (SIAM J Optim 18(1):186–205,
2007
) is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed
k
-means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei (SIAM J Optim 18(1):186–205,
2007
) that is designed to solve
k
-means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s10107-016-1097-0</doi><tpages>38</tpages><orcidid>https://orcid.org/0000-0003-2743-7010</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0025-5610 |
ispartof | Mathematical programming, 2017-10, Vol.165 (2), p.605-642 |
issn | 0025-5610 1436-4646 |
language | eng |
recordid | cdi_proquest_journals_1944763400 |
source | SpringerLink Journals; Business Source Complete |
subjects | Algorithms Calculus of Variations and Optimal Control Optimization Cluster analysis Clustering Clusters Combinatorics Full Length Paper Mathematical and Computational Physics Mathematical Methods in Physics Mathematics Mathematics and Statistics Mathematics of Computing Numerical Analysis Solvers Theoretical Vector quantization |
title | Probably certifiably correct k-means clustering |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T08%3A22%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Probably%20certifiably%20correct%20k-means%20clustering&rft.jtitle=Mathematical%20programming&rft.au=Iguchi,%20Takayuki&rft.date=2017-10-01&rft.volume=165&rft.issue=2&rft.spage=605&rft.epage=642&rft.pages=605-642&rft.issn=0025-5610&rft.eissn=1436-4646&rft_id=info:doi/10.1007/s10107-016-1097-0&rft_dat=%3Cproquest_cross%3E1944763400%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1944763400&rft_id=info:pmid/&rfr_iscdi=true |