Structured orthogonal random features based on DCT for kernel approximation
Random Fourier features are a popular technique used to improve nonlinear kernel methods in large-scale problems. Recent studies have shown that replacing random Gaussian matrices of random feature maps with appropriately scaled random orthogonal matrices, such as SORF, can significantly improve ker...
Gespeichert in:
Veröffentlicht in: | Neurocomputing (Amsterdam) 2024-12, Vol.610, p.128640, Article 128640 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Random Fourier features are a popular technique used to improve nonlinear kernel methods in large-scale problems. Recent studies have shown that replacing random Gaussian matrices of random feature maps with appropriately scaled random orthogonal matrices, such as SORF, can significantly improve kernel approximation performance. However, the SORF method theoretically requires zero-padding for datasets whose input feature dimension does not meet d=2p for some p∈N+, resulting in increased memory and computation time. To address this limitation, we propose a new structured orthogonal random features method based on discrete cosine transform (SORF-DCT) to approximate Gaussian kernel functions. The SORF-DCT method does not require the input feature dimension to be d=2p. The key to SORF-DCT is that we combine the DCT matrix with diagonal ”sign-flipping” matrices and scale them appropriately using a diagonal matrix whose diagonal elements follow the chi distribution, resulting in a structured orthogonal matrix whose elements follow the standard Gaussian distribution instead of a random Gaussian matrix. SORF-DCT generates D-dimensional structured orthogonal random features in O(Dlogd) time and O(D) memory by employing fast discrete cosine transform, where d and D represent the input feature dimension and expansion dimension, respectively. We prove that SORF-DCT is an unbiased estimator of the Gaussian kernel function and analyze the concentration error bounds. Experimental results on eleven benchmark datasets show that SORF-DCT achieves lower kernel approximation errors, requires less time, and has comparable nonlinear learning capabilities. |
---|---|
ISSN: | 0925-2312 |
DOI: | 10.1016/j.neucom.2024.128640 |