ePCA: HIGH DIMENSIONAL EXPONENTIAL FAMILY PCA

Many applications involve large datasets with entries from exponential family distributions. Our main motivating application is photon-limited imaging, where we observe images with Poisson distributed pixels. We focus on X-ray Free Electron Lasers (XFEL), a quickly developing technology whose goal i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The annals of applied statistics 2018-12, Vol.12 (4), p.2121-2150
Hauptverfasser: Liu, Lydia T., Dobriban, Edgar, Singer, Amit
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Many applications involve large datasets with entries from exponential family distributions. Our main motivating application is photon-limited imaging, where we observe images with Poisson distributed pixels. We focus on X-ray Free Electron Lasers (XFEL), a quickly developing technology whose goal is to reconstruct molecular structure. In XFEL, estimating the principal components of the noiseless distribution is needed for denoising and for structure determination. However, the standard method, Principal Component Analysis (PCA), can be inefficient in non-Gaussian noise. Motivated by this application, we develop ePCA (exponential family PCA), a new methodology for PCA on exponential families. ePCA is a fast method that can be used very generally for dimension reduction and denoising of large data matrices with exponential family entries. We conduct a substantive XFEL data analysis using ePCA. We show that ePCA estimates the PCs of the distribution of images more accurately than PCA and alternatives. Importantly, it also leads to better denoising. We also provide theoretical justification for our estimator, including the convergence rate and the Marchenko–Pastur law in high dimensions. An open-source implementation is available.
ISSN:1932-6157
1941-7330
DOI:10.1214/18-AOAS1146