Faster Projective Clustering Approximation of Big Data
In projective clustering we are given a set of n points in $R^d$ and wish to cluster them to a set $S$ of $k$ linear subspaces in $R^d$ according to some given distance function. An $\eps$-coreset for this problem is a weighted (scaled) subset of the input points such that for every such possible $S...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In projective clustering we are given a set of n points in $R^d$ and wish to
cluster them to a set $S$ of $k$ linear subspaces in $R^d$ according to some
given distance function. An $\eps$-coreset for this problem is a weighted
(scaled) subset of the input points such that for every such possible $S$ the
sum of these distances is approximated up to a factor of $(1+\eps)$. We suggest
to reduce the size of existing coresets by suggesting the first $O(\log(m))$
approximation for the case of $m$ lines clustering in $O(ndm)$ time, compared
to the existing $\exp(m)$ solution. We then project the points on these lines
and prove that for a sufficiently large $m$ we obtain a coreset for projective
clustering. Our algorithm also generalize to handle outliers. Experimental
results and open code are also provided. |
---|---|
DOI: | 10.48550/arxiv.2011.13476 |