Faster Projective Clustering Approximation of Big Data

In projective clustering we are given a set of n points in \(R^d\) and wish to cluster them to a set \(S\) of \(k\) linear subspaces in \(R^d\) according to some given distance function. An \(\eps\)-coreset for this problem is a weighted (scaled) subset of the input points such that for every such p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2020-11
Hauptverfasser:	Statman, Adiel, Rozenberg, Liat, Feldman, Dan
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Approximation Clustering Mathematical analysis Outliers (statistics) Subspaces
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In projective clustering we are given a set of n points in \(R^d\) and wish to cluster them to a set \(S\) of \(k\) linear subspaces in \(R^d\) according to some given distance function. An \(\eps\)-coreset for this problem is a weighted (scaled) subset of the input points such that for every such possible \(S\) the sum of these distances is approximated up to a factor of \((1+\eps)\). We suggest to reduce the size of existing coresets by suggesting the first \(O(\log(m))\) approximation for the case of \(m\) lines clustering in \(O(ndm)\) time, compared to the existing \(\exp(m)\) solution. We then project the points on these lines and prove that for a sufficiently large \(m\) we obtain a coreset for projective clustering. Our algorithm also generalize to handle outliers. Experimental results and open code are also provided.
ISSN:	2331-8422