Polynomial-time approximation schemes for geometric min-sum median clustering
The Johnson--Lindenstrauss lemma states that n points in a high-dimensional Hilbert space can be embedded with small distortion of the distances into an O (log n ) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random li...
Gespeichert in:
Veröffentlicht in: | Journal of the ACM 2002-03, Vol.49 (2), p.139-156 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Johnson--Lindenstrauss lemma states that
n
points in a
high-dimensional Hilbert space can be embedded with small
distortion of the distances into an
O
(log
n
)
dimensional space by applying a random linear transformation. We
show that similar (though weaker) properties hold for certain
random linear transformations over the Hamming cube. We use these
transformations to solve NP-hard clustering problems in the cube as
well as in geometric settings.More specifically, we address the
following clustering problem. Given
n
points in a larger set
(e.g., ℝ
d
) endowed with a distance function (e.g.,
L
2
distance), we would like to partition the data
set into
k
disjoint clusters, each with a "cluster center,"
so as to minimize the sum over all data points of the distance
between the point and the center of the cluster containing the
point. The problem is provably NP-hard in some high-dimensional
geometric settings, even for
k
= 2. We give polynomial-time
approximation schemes for this problem in several settings,
including the binary cube {0,1}
d
with Hamming distance,
and ℝ
d
either with
L
1
distance,
or with
L
2
distance, or with the square of
L
2
distance. In all these settings, the best
previous results were constant factor approximation guarantees.We
note that our problem is similar in flavor to the
k
-median
problem (and the related facility location problem), which has been
considered in graph-theoretic and fixed dimensional geometric
settings, where it becomes hard when
k
is part of the input.
In contrast, we study the problem when
k
is fixed, but the
dimension is part of the input. |
---|---|
ISSN: | 0004-5411 1557-735X |
DOI: | 10.1145/506147.506149 |