Efficient Sparse PCA via Block-Diagonalization
Sparse Principal Component Analysis (Sparse PCA) is a pivotal tool in data analysis and dimensionality reduction. However, Sparse PCA is a challenging problem in both theory and practice: it is known to be NP-hard and current exact methods generally require exponential runtime. In this paper, we pro...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Sparse Principal Component Analysis (Sparse PCA) is a pivotal tool in data
analysis and dimensionality reduction. However, Sparse PCA is a challenging
problem in both theory and practice: it is known to be NP-hard and current
exact methods generally require exponential runtime. In this paper, we propose
a novel framework to efficiently approximate Sparse PCA by (i) approximating
the general input covariance matrix with a re-sorted block-diagonal matrix,
(ii) solving the Sparse PCA sub-problem in each block, and (iii) reconstructing
the solution to the original problem. Our framework is simple and powerful: it
can leverage any off-the-shelf Sparse PCA algorithm and achieve significant
computational speedups, with a minor additive error that is linear in the
approximation error of the block-diagonal matrix. Suppose $g(k, d)$ is the
runtime of an algorithm (approximately) solving Sparse PCA in dimension $d$ and
with sparsity value $k$. Our framework, when integrated with this algorithm,
reduces the runtime to $\mathcal{O}\left(\frac{d}{d^\star} \cdot g(k, d^\star)
+ d^2\right)$, where $d^\star \leq d$ is the largest block size of the
block-diagonal matrix. For instance, integrating our framework with the
Branch-and-Bound algorithm reduces the complexity from $g(k, d) =
\mathcal{O}(k^3\cdot d^k)$ to $\mathcal{O}(k^3\cdot d \cdot (d^\star)^{k-1})$,
demonstrating exponential speedups if $d^\star$ is small. We perform
large-scale evaluations on many real-world datasets: for exact Sparse PCA
algorithm, our method achieves an average speedup factor of 93.77, while
maintaining an average approximation error of 2.15%; for approximate Sparse PCA
algorithm, our method achieves an average speedup factor of 6.77 and an average
approximation error of merely 0.37%. |
---|---|
DOI: | 10.48550/arxiv.2410.14092 |