Optimal Fast Johnson-Lindenstrauss Embeddings for Large Data Sets
Johnson-Lindenstrauss embeddings are widely used to reduce the dimension and thus the processing time of data. To reduce the total complexity, also fast algorithms for applying these embeddings are necessary. To date, such fast algorithms are only available either for a non-optimal embedding dimensi...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Johnson-Lindenstrauss embeddings are widely used to reduce the dimension and
thus the processing time of data. To reduce the total complexity, also fast
algorithms for applying these embeddings are necessary. To date, such fast
algorithms are only available either for a non-optimal embedding dimension or
up to a certain threshold on the number of data points.
We address a variant of this problem where one aims to simultaneously embed
larger subsets of the data set. Our method follows an approach by Nelson: A
subsampled Hadamard transform maps points into a space of lower, but not
optimal dimension. Subsequently, a random matrix with independent entries
projects to an optimal embedding dimension.
For subsets whose size scales at least polynomially in the ambient dimension,
the complexity of this method comes close to the number of operations just to
read the data under mild assumptions on the size of the data set that are
considerably less restrictive than in previous works. We also prove a lower
bound showing that subsampled Hadamard matrices alone cannot reach an optimal
embedding dimension. Hence, the second embedding cannot be omitted. |
---|---|
DOI: | 10.48550/arxiv.1712.01774 |