Random Projections for k-Means: Maintaining Coresets Beyond Merge & Reduce
We give a new construction for a small space summary satisfying the coreset guarantee of a data set with respect to the $k$-means objective function. The number of points required in an offline construction is in $\tilde{O}(k \epsilon^{-2}\min(d,k\epsilon^{-2}))$ which is minimal among all available...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We give a new construction for a small space summary satisfying the coreset
guarantee of a data set with respect to the $k$-means objective function. The
number of points required in an offline construction is in $\tilde{O}(k
\epsilon^{-2}\min(d,k\epsilon^{-2}))$ which is minimal among all available
constructions.
Aside from two constructions with exponential dependence on the dimension,
all known coresets are maintained in data streams via the merge and reduce
framework, which incurs are large space dependency on $\log n$. Instead, our
construction crucially relies on Johnson-Lindenstrauss type embeddings which
combined with results from online algorithms give us a new technique for
efficiently maintaining coresets in data streams without relying on merge and
reduce. The final number of points stored by our algorithm in a data stream is
in $\tilde{O}(k^2 \epsilon^{-2} \log^2 n \min(d,k\epsilon^{-2}))$. |
---|---|
DOI: | 10.48550/arxiv.1504.01584 |