Polynomial-time approximation schemes for geometric min-sum median clustering

The Johnson--Lindenstrauss lemma states that n points in a high-dimensional Hilbert space can be embedded with small distortion of the distances into an O (log n ) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random li...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the ACM 2002-03, Vol.49 (2), p.139-156
Hauptverfasser: Ostrovsky, Rafail, Rabani, Yuval
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 156
container_issue 2
container_start_page 139
container_title Journal of the ACM
container_volume 49
creator Ostrovsky, Rafail
Rabani, Yuval
description The Johnson--Lindenstrauss lemma states that n points in a high-dimensional Hilbert space can be embedded with small distortion of the distances into an O (log n ) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. We use these transformations to solve NP-hard clustering problems in the cube as well as in geometric settings.More specifically, we address the following clustering problem. Given n points in a larger set (e.g., ℝ d ) endowed with a distance function (e.g., L 2 distance), we would like to partition the data set into k disjoint clusters, each with a "cluster center," so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NP-hard in some high-dimensional geometric settings, even for k = 2. We give polynomial-time approximation schemes for this problem in several settings, including the binary cube {0,1} d with Hamming distance, and ℝ d either with L 1 distance, or with L 2 distance, or with the square of L 2 distance. In all these settings, the best previous results were constant factor approximation guarantees.We note that our problem is similar in flavor to the k -median problem (and the related facility location problem), which has been considered in graph-theoretic and fixed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is fixed, but the dimension is part of the input.
doi_str_mv 10.1145/506147.506149
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_28999024</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1808079958</sourcerecordid><originalsourceid>FETCH-LOGICAL-c299t-cd309e14997bfce796afd1ed566892e3c8902d3129d695e9b784f6c12ae0df073</originalsourceid><addsrcrecordid>eNp9kE1LxDAURYMoOI4u3XclbjImadP0LWXwC0Z0oeCuZJLXMdI0NWnB-fdW69rV4cLhcrmEnHO24ryQV5KVvFCrX8ABWXApFVW5fDskC8ZYQWXB-TE5SeljikwwtSCPz6Hdd8E73dLBecx038fw5bweXOiyZN7RY8qaELMdBo9DdCbzrqNp9JlH63SXmXZMA0bX7U7JUaPbhGd_XJLX25uX9T3dPN09rK831AiAgRqbM8BpJKhtY1BBqRvL0cqyrEBgbipgwuZcgC1BImxVVTSl4UIjsw1T-ZJczL3T1s8R01B7lwy2re4wjKkWFcBUUUzi5b8ir1jFFICsJpXOqokhpYhN3cfphrivOat__q3nf2dA_g34pm3K</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1808079958</pqid></control><display><type>article</type><title>Polynomial-time approximation schemes for geometric min-sum median clustering</title><source>ACM Digital Library Complete</source><creator>Ostrovsky, Rafail ; Rabani, Yuval</creator><creatorcontrib>Ostrovsky, Rafail ; Rabani, Yuval</creatorcontrib><description>The Johnson--Lindenstrauss lemma states that n points in a high-dimensional Hilbert space can be embedded with small distortion of the distances into an O (log n ) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. We use these transformations to solve NP-hard clustering problems in the cube as well as in geometric settings.More specifically, we address the following clustering problem. Given n points in a larger set (e.g., ℝ d ) endowed with a distance function (e.g., L 2 distance), we would like to partition the data set into k disjoint clusters, each with a "cluster center," so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NP-hard in some high-dimensional geometric settings, even for k = 2. We give polynomial-time approximation schemes for this problem in several settings, including the binary cube {0,1} d with Hamming distance, and ℝ d either with L 1 distance, or with L 2 distance, or with the square of L 2 distance. In all these settings, the best previous results were constant factor approximation guarantees.We note that our problem is similar in flavor to the k -median problem (and the related facility location problem), which has been considered in graph-theoretic and fixed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is fixed, but the dimension is part of the input.</description><identifier>ISSN: 0004-5411</identifier><identifier>EISSN: 1557-735X</identifier><identifier>DOI: 10.1145/506147.506149</identifier><language>eng</language><subject>Approximation ; Clustering ; Clusters ; Cubes ; Distortion ; Functions (mathematics) ; Linear transformations ; Mathematical analysis</subject><ispartof>Journal of the ACM, 2002-03, Vol.49 (2), p.139-156</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c299t-cd309e14997bfce796afd1ed566892e3c8902d3129d695e9b784f6c12ae0df073</citedby><cites>FETCH-LOGICAL-c299t-cd309e14997bfce796afd1ed566892e3c8902d3129d695e9b784f6c12ae0df073</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Ostrovsky, Rafail</creatorcontrib><creatorcontrib>Rabani, Yuval</creatorcontrib><title>Polynomial-time approximation schemes for geometric min-sum median clustering</title><title>Journal of the ACM</title><description>The Johnson--Lindenstrauss lemma states that n points in a high-dimensional Hilbert space can be embedded with small distortion of the distances into an O (log n ) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. We use these transformations to solve NP-hard clustering problems in the cube as well as in geometric settings.More specifically, we address the following clustering problem. Given n points in a larger set (e.g., ℝ d ) endowed with a distance function (e.g., L 2 distance), we would like to partition the data set into k disjoint clusters, each with a "cluster center," so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NP-hard in some high-dimensional geometric settings, even for k = 2. We give polynomial-time approximation schemes for this problem in several settings, including the binary cube {0,1} d with Hamming distance, and ℝ d either with L 1 distance, or with L 2 distance, or with the square of L 2 distance. In all these settings, the best previous results were constant factor approximation guarantees.We note that our problem is similar in flavor to the k -median problem (and the related facility location problem), which has been considered in graph-theoretic and fixed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is fixed, but the dimension is part of the input.</description><subject>Approximation</subject><subject>Clustering</subject><subject>Clusters</subject><subject>Cubes</subject><subject>Distortion</subject><subject>Functions (mathematics)</subject><subject>Linear transformations</subject><subject>Mathematical analysis</subject><issn>0004-5411</issn><issn>1557-735X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2002</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAURYMoOI4u3XclbjImadP0LWXwC0Z0oeCuZJLXMdI0NWnB-fdW69rV4cLhcrmEnHO24ryQV5KVvFCrX8ABWXApFVW5fDskC8ZYQWXB-TE5SeljikwwtSCPz6Hdd8E73dLBecx038fw5bweXOiyZN7RY8qaELMdBo9DdCbzrqNp9JlH63SXmXZMA0bX7U7JUaPbhGd_XJLX25uX9T3dPN09rK831AiAgRqbM8BpJKhtY1BBqRvL0cqyrEBgbipgwuZcgC1BImxVVTSl4UIjsw1T-ZJczL3T1s8R01B7lwy2re4wjKkWFcBUUUzi5b8ir1jFFICsJpXOqokhpYhN3cfphrivOat__q3nf2dA_g34pm3K</recordid><startdate>20020301</startdate><enddate>20020301</enddate><creator>Ostrovsky, Rafail</creator><creator>Rabani, Yuval</creator><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20020301</creationdate><title>Polynomial-time approximation schemes for geometric min-sum median clustering</title><author>Ostrovsky, Rafail ; Rabani, Yuval</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c299t-cd309e14997bfce796afd1ed566892e3c8902d3129d695e9b784f6c12ae0df073</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Approximation</topic><topic>Clustering</topic><topic>Clusters</topic><topic>Cubes</topic><topic>Distortion</topic><topic>Functions (mathematics)</topic><topic>Linear transformations</topic><topic>Mathematical analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ostrovsky, Rafail</creatorcontrib><creatorcontrib>Rabani, Yuval</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of the ACM</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ostrovsky, Rafail</au><au>Rabani, Yuval</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Polynomial-time approximation schemes for geometric min-sum median clustering</atitle><jtitle>Journal of the ACM</jtitle><date>2002-03-01</date><risdate>2002</risdate><volume>49</volume><issue>2</issue><spage>139</spage><epage>156</epage><pages>139-156</pages><issn>0004-5411</issn><eissn>1557-735X</eissn><abstract>The Johnson--Lindenstrauss lemma states that n points in a high-dimensional Hilbert space can be embedded with small distortion of the distances into an O (log n ) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. We use these transformations to solve NP-hard clustering problems in the cube as well as in geometric settings.More specifically, we address the following clustering problem. Given n points in a larger set (e.g., ℝ d ) endowed with a distance function (e.g., L 2 distance), we would like to partition the data set into k disjoint clusters, each with a "cluster center," so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NP-hard in some high-dimensional geometric settings, even for k = 2. We give polynomial-time approximation schemes for this problem in several settings, including the binary cube {0,1} d with Hamming distance, and ℝ d either with L 1 distance, or with L 2 distance, or with the square of L 2 distance. In all these settings, the best previous results were constant factor approximation guarantees.We note that our problem is similar in flavor to the k -median problem (and the related facility location problem), which has been considered in graph-theoretic and fixed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is fixed, but the dimension is part of the input.</abstract><doi>10.1145/506147.506149</doi><tpages>18</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0004-5411
ispartof Journal of the ACM, 2002-03, Vol.49 (2), p.139-156
issn 0004-5411
1557-735X
language eng
recordid cdi_proquest_miscellaneous_28999024
source ACM Digital Library Complete
subjects Approximation
Clustering
Clusters
Cubes
Distortion
Functions (mathematics)
Linear transformations
Mathematical analysis
title Polynomial-time approximation schemes for geometric min-sum median clustering
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T05%3A22%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Polynomial-time%20approximation%20schemes%20for%20geometric%20min-sum%20median%20clustering&rft.jtitle=Journal%20of%20the%20ACM&rft.au=Ostrovsky,%20Rafail&rft.date=2002-03-01&rft.volume=49&rft.issue=2&rft.spage=139&rft.epage=156&rft.pages=139-156&rft.issn=0004-5411&rft.eissn=1557-735X&rft_id=info:doi/10.1145/506147.506149&rft_dat=%3Cproquest_cross%3E1808079958%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1808079958&rft_id=info:pmid/&rfr_iscdi=true