Empirical Bayes PCA in high dimensions

When the dimension of data is comparable to or larger than the number of data samples, principal components analysis (PCA) may exhibit problematic high‐dimensional noise. In this work, we propose an empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the Royal Statistical Society. Series B, Statistical methodology Statistical methodology, 2022-07, Vol.84 (3), p.853-878
Hauptverfasser: Zhong, Xinyi, Su, Chang, Fan, Zhou
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:When the dimension of data is comparable to or larger than the number of data samples, principal components analysis (PCA) may exhibit problematic high‐dimensional noise. In this work, we propose an empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the principal components. EB‐PCA is based on the classical Kiefer–Wolfowitz non‐parametric maximum likelihood estimator for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs and iterative refinement using an approximate message passing (AMP) algorithm. In theoretical ‘spiked’ models, EB‐PCA achieves Bayes‐optimal estimation accuracy in the same settings as an oracle Bayes AMP procedure that knows the true priors. Empirically, EB‐PCA significantly improves over PCA when there is strong prior structure, both in simulation and on quantitative benchmarks constructed from the 1000 Genomes Project and the International HapMap Project. An illustration is presented for analysis of gene expression data obtained by single‐cell RNA‐seq.
ISSN:1369-7412
1467-9868
DOI:10.1111/rssb.12490