Hierarchical Clustering via Spreading Metrics

We study the cost function for hierarchical clusterings introduced by [arXiv:1510.05043] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [arXiv:1510.05043] that a top-down algorithm returns a hierarchical...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2016-10
Hauptverfasser:	Aurko Roy, Pokutta, Sebastian
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Approximation Cluster analysis Clustering Combinatorial analysis Cost function Hierarchies Integer programming Linear programming Mathematical analysis Rounding
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Aurko Roy Pokutta, Sebastian
description	We study the cost function for hierarchical clusterings introduced by [arXiv:1510.05043] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [arXiv:1510.05043] that a top-down algorithm returns a hierarchical clustering of cost at most $O\left(\alpha_n \log n\right)$ times the cost of the optimal hierarchical clustering, where $\alpha_n$ is the approximation ratio of the Sparsest Cut subroutine used. Thus using the best known approximation algorithm for Sparsest Cut due to Arora-Rao-Vazirani, the top down algorithm returns a hierarchical clustering of cost at most $O\left(\log^{3/2} n\right)$ times the cost of the optimal solution. We improve this by giving an $O(\log{n})$-approximation algorithm for this problem. Our main technical ingredients are a combinatorial characterization of ultrametrics induced by this cost function, deriving an Integer Linear Programming (ILP) formulation for this family of ultrametrics, and showing how to iteratively round an LP relaxation of this formulation by using the idea of \emph{sphere growing} which has been extensively used in the context of graph partitioning. We also prove that our algorithm returns an $O(\log{n})$-approximate hierarchical clustering for a generalization of this cost function also studied in [arXiv:1510.05043]. Experiments show that the hierarchies found by using the ILP formulation as well as our rounding algorithm often have better projections into flat clusters than the standard linkage based algorithms. We also give constant factor inapproximability results for this problem.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2080322237</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2080322237</sourcerecordid><originalsourceid>FETCH-proquest_journals_20803222373</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQ9chMLUosSs7ITE7MUXDOKS0uSS3KzEtXKMtMVAguKEpNTAHxfFNLijKTi3kYWNMSc4pTeaE0N4Oym2uIs4duQVF-YWlqcUl8Vn5pUR5QKt7IwMLA2MjIyNjcmDhVAMoSMf8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2080322237</pqid></control><display><type>article</type><title>Hierarchical Clustering via Spreading Metrics</title><source>Freely Accessible Journals</source><creator>Aurko Roy ; Pokutta, Sebastian</creator><creatorcontrib>Aurko Roy ; Pokutta, Sebastian</creatorcontrib><description>We study the cost function for hierarchical clusterings introduced by [arXiv:1510.05043] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [arXiv:1510.05043] that a top-down algorithm returns a hierarchical clustering of cost at most $O\left(\alpha_n \log n\right)$ times the cost of the optimal hierarchical clustering, where $\alpha_n$ is the approximation ratio of the Sparsest Cut subroutine used. Thus using the best known approximation algorithm for Sparsest Cut due to Arora-Rao-Vazirani, the top down algorithm returns a hierarchical clustering of cost at most $O\left(\log^{3/2} n\right)$ times the cost of the optimal solution. We improve this by giving an $O(\log{n})$-approximation algorithm for this problem. Our main technical ingredients are a combinatorial characterization of ultrametrics induced by this cost function, deriving an Integer Linear Programming (ILP) formulation for this family of ultrametrics, and showing how to iteratively round an LP relaxation of this formulation by using the idea of \emph{sphere growing} which has been extensively used in the context of graph partitioning. We also prove that our algorithm returns an $O(\log{n})$-approximate hierarchical clustering for a generalization of this cost function also studied in [arXiv:1510.05043]. Experiments show that the hierarchies found by using the ILP formulation as well as our rounding algorithm often have better projections into flat clusters than the standard linkage based algorithms. We also give constant factor inapproximability results for this problem.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Approximation ; Cluster analysis ; Clustering ; Combinatorial analysis ; Cost function ; Hierarchies ; Integer programming ; Linear programming ; Mathematical analysis ; Rounding</subject><ispartof>arXiv.org, 2016-10</ispartof><rights>2016. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>781,785</link.rule.ids></links><search><creatorcontrib>Aurko Roy</creatorcontrib><creatorcontrib>Pokutta, Sebastian</creatorcontrib><title>Hierarchical Clustering via Spreading Metrics</title><title>arXiv.org</title><description>We study the cost function for hierarchical clusterings introduced by [arXiv:1510.05043] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [arXiv:1510.05043] that a top-down algorithm returns a hierarchical clustering of cost at most $O\left(\alpha_n \log n\right)$ times the cost of the optimal hierarchical clustering, where $\alpha_n$ is the approximation ratio of the Sparsest Cut subroutine used. Thus using the best known approximation algorithm for Sparsest Cut due to Arora-Rao-Vazirani, the top down algorithm returns a hierarchical clustering of cost at most $O\left(\log^{3/2} n\right)$ times the cost of the optimal solution. We improve this by giving an $O(\log{n})$-approximation algorithm for this problem. Our main technical ingredients are a combinatorial characterization of ultrametrics induced by this cost function, deriving an Integer Linear Programming (ILP) formulation for this family of ultrametrics, and showing how to iteratively round an LP relaxation of this formulation by using the idea of \emph{sphere growing} which has been extensively used in the context of graph partitioning. We also prove that our algorithm returns an $O(\log{n})$-approximate hierarchical clustering for a generalization of this cost function also studied in [arXiv:1510.05043]. Experiments show that the hierarchies found by using the ILP formulation as well as our rounding algorithm often have better projections into flat clusters than the standard linkage based algorithms. We also give constant factor inapproximability results for this problem.</description><subject>Algorithms</subject><subject>Approximation</subject><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Combinatorial analysis</subject><subject>Cost function</subject><subject>Hierarchies</subject><subject>Integer programming</subject><subject>Linear programming</subject><subject>Mathematical analysis</subject><subject>Rounding</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQ9chMLUosSs7ITE7MUXDOKS0uSS3KzEtXKMtMVAguKEpNTAHxfFNLijKTi3kYWNMSc4pTeaE0N4Oym2uIs4duQVF-YWlqcUl8Vn5pUR5QKt7IwMLA2MjIyNjcmDhVAMoSMf8</recordid><startdate>20161028</startdate><enddate>20161028</enddate><creator>Aurko Roy</creator><creator>Pokutta, Sebastian</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20161028</creationdate><title>Hierarchical Clustering via Spreading Metrics</title><author>Aurko Roy ; Pokutta, Sebastian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20803222373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Approximation</topic><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Combinatorial analysis</topic><topic>Cost function</topic><topic>Hierarchies</topic><topic>Integer programming</topic><topic>Linear programming</topic><topic>Mathematical analysis</topic><topic>Rounding</topic><toplevel>online_resources</toplevel><creatorcontrib>Aurko Roy</creatorcontrib><creatorcontrib>Pokutta, Sebastian</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Aurko Roy</au><au>Pokutta, Sebastian</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Hierarchical Clustering via Spreading Metrics</atitle><jtitle>arXiv.org</jtitle><date>2016-10-28</date><risdate>2016</risdate><eissn>2331-8422</eissn><abstract>We study the cost function for hierarchical clusterings introduced by [arXiv:1510.05043] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [arXiv:1510.05043] that a top-down algorithm returns a hierarchical clustering of cost at most $O\left(\alpha_n \log n\right)$ times the cost of the optimal hierarchical clustering, where $\alpha_n$ is the approximation ratio of the Sparsest Cut subroutine used. Thus using the best known approximation algorithm for Sparsest Cut due to Arora-Rao-Vazirani, the top down algorithm returns a hierarchical clustering of cost at most $O\left(\log^{3/2} n\right)$ times the cost of the optimal solution. We improve this by giving an $O(\log{n})$-approximation algorithm for this problem. Our main technical ingredients are a combinatorial characterization of ultrametrics induced by this cost function, deriving an Integer Linear Programming (ILP) formulation for this family of ultrametrics, and showing how to iteratively round an LP relaxation of this formulation by using the idea of \emph{sphere growing} which has been extensively used in the context of graph partitioning. We also prove that our algorithm returns an $O(\log{n})$-approximate hierarchical clustering for a generalization of this cost function also studied in [arXiv:1510.05043]. Experiments show that the hierarchies found by using the ILP formulation as well as our rounding algorithm often have better projections into flat clusters than the standard linkage based algorithms. We also give constant factor inapproximability results for this problem.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2016-10
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2080322237
source	Freely Accessible Journals
subjects	Algorithms Approximation Cluster analysis Clustering Combinatorial analysis Cost function Hierarchies Integer programming Linear programming Mathematical analysis Rounding
title	Hierarchical Clustering via Spreading Metrics
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T04%3A44%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Hierarchical%20Clustering%20via%20Spreading%20Metrics&rft.jtitle=arXiv.org&rft.au=Aurko%20Roy&rft.date=2016-10-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2080322237%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2080322237&rft_id=info:pmid/&rfr_iscdi=true