Empirical Bayes PCA in high dimensions

When the dimension of data is comparable to or larger than the number of data samples, principal components analysis (PCA) may exhibit problematic high‐dimensional noise. In this work, we propose an empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the Royal Statistical Society. Series B, Statistical methodology Statistical methodology, 2022-07, Vol.84 (3), p.853-878
Hauptverfasser: Zhong, Xinyi, Su, Chang, Fan, Zhou
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 878
container_issue 3
container_start_page 853
container_title Journal of the Royal Statistical Society. Series B, Statistical methodology
container_volume 84
creator Zhong, Xinyi
Su, Chang
Fan, Zhou
description When the dimension of data is comparable to or larger than the number of data samples, principal components analysis (PCA) may exhibit problematic high‐dimensional noise. In this work, we propose an empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the principal components. EB‐PCA is based on the classical Kiefer–Wolfowitz non‐parametric maximum likelihood estimator for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs and iterative refinement using an approximate message passing (AMP) algorithm. In theoretical ‘spiked’ models, EB‐PCA achieves Bayes‐optimal estimation accuracy in the same settings as an oracle Bayes AMP procedure that knows the true priors. Empirically, EB‐PCA significantly improves over PCA when there is strong prior structure, both in simulation and on quantitative benchmarks constructed from the 1000 Genomes Project and the International HapMap Project. An illustration is presented for analysis of gene expression data obtained by single‐cell RNA‐seq.
doi_str_mv 10.1111/rssb.12490
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2694717251</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2694717251</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3700-f526ee203a63f5354ebf52dafd752eb82b87af5f5f7e98972db989a7011233423</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKsXf8GC4EHYmu9sju1SbaGgWD2H7G5iU_bLxFL235u6nn3n8A7DMzPwAnCL4AxFPfoQihnCVMIzMEGUi1RmPDuPPeEyFRThS3AVwh5GcUEm4H7Z9M67UtfJQg8mJK_5PHFtsnOfu6RyjWmD69pwDS6sroO5-fMp-HhavuerdPPyvM7nm7QkAsLUMsyNwZBoTiwjjJoijiptK8GwKTJcZEJbFksYmUmBqyKaFhAhTAjFZAruxru9774OJnyrfXfwbXypMJdUIIEZitTDSJW-C8Ebq3rvGu0HhaA65aBOOajfHCKMRvjoajP8Q6q37XYx7vwAYYVdjQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2694717251</pqid></control><display><type>article</type><title>Empirical Bayes PCA in high dimensions</title><source>Oxford University Press Journals All Titles (1996-Current)</source><source>Wiley Online Library Journals Frontfile Complete</source><source>EBSCOhost Business Source Complete</source><creator>Zhong, Xinyi ; Su, Chang ; Fan, Zhou</creator><creatorcontrib>Zhong, Xinyi ; Su, Chang ; Fan, Zhou</creatorcontrib><description>When the dimension of data is comparable to or larger than the number of data samples, principal components analysis (PCA) may exhibit problematic high‐dimensional noise. In this work, we propose an empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the principal components. EB‐PCA is based on the classical Kiefer–Wolfowitz non‐parametric maximum likelihood estimator for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs and iterative refinement using an approximate message passing (AMP) algorithm. In theoretical ‘spiked’ models, EB‐PCA achieves Bayes‐optimal estimation accuracy in the same settings as an oracle Bayes AMP procedure that knows the true priors. Empirically, EB‐PCA significantly improves over PCA when there is strong prior structure, both in simulation and on quantitative benchmarks constructed from the 1000 Genomes Project and the International HapMap Project. An illustration is presented for analysis of gene expression data obtained by single‐cell RNA‐seq.</description><identifier>ISSN: 1369-7412</identifier><identifier>EISSN: 1467-9868</identifier><identifier>DOI: 10.1111/rssb.12490</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; AMP algorithms ; Empirical analysis ; empirical Bayes ; Estimation ; Gene expression ; Genomes ; Genomics ; Iterative methods ; Matrix theory ; Maximum likelihood estimators ; Message passing ; Noise ; Principal components analysis ; random matrix theory ; Regression analysis ; Simulation ; Statistical methods ; Statistics</subject><ispartof>Journal of the Royal Statistical Society. Series B, Statistical methodology, 2022-07, Vol.84 (3), p.853-878</ispartof><rights>2022 Royal Statistical Society</rights><rights>Copyright © 2022 The Royal Statistical Society and Blackwell Publishing Ltd</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3700-f526ee203a63f5354ebf52dafd752eb82b87af5f5f7e98972db989a7011233423</citedby><cites>FETCH-LOGICAL-c3700-f526ee203a63f5354ebf52dafd752eb82b87af5f5f7e98972db989a7011233423</cites><orcidid>0000-0002-5940-4697 ; 0000-0002-8704-1512</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Frssb.12490$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Frssb.12490$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Zhong, Xinyi</creatorcontrib><creatorcontrib>Su, Chang</creatorcontrib><creatorcontrib>Fan, Zhou</creatorcontrib><title>Empirical Bayes PCA in high dimensions</title><title>Journal of the Royal Statistical Society. Series B, Statistical methodology</title><description>When the dimension of data is comparable to or larger than the number of data samples, principal components analysis (PCA) may exhibit problematic high‐dimensional noise. In this work, we propose an empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the principal components. EB‐PCA is based on the classical Kiefer–Wolfowitz non‐parametric maximum likelihood estimator for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs and iterative refinement using an approximate message passing (AMP) algorithm. In theoretical ‘spiked’ models, EB‐PCA achieves Bayes‐optimal estimation accuracy in the same settings as an oracle Bayes AMP procedure that knows the true priors. Empirically, EB‐PCA significantly improves over PCA when there is strong prior structure, both in simulation and on quantitative benchmarks constructed from the 1000 Genomes Project and the International HapMap Project. An illustration is presented for analysis of gene expression data obtained by single‐cell RNA‐seq.</description><subject>Algorithms</subject><subject>AMP algorithms</subject><subject>Empirical analysis</subject><subject>empirical Bayes</subject><subject>Estimation</subject><subject>Gene expression</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Iterative methods</subject><subject>Matrix theory</subject><subject>Maximum likelihood estimators</subject><subject>Message passing</subject><subject>Noise</subject><subject>Principal components analysis</subject><subject>random matrix theory</subject><subject>Regression analysis</subject><subject>Simulation</subject><subject>Statistical methods</subject><subject>Statistics</subject><issn>1369-7412</issn><issn>1467-9868</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKsXf8GC4EHYmu9sju1SbaGgWD2H7G5iU_bLxFL235u6nn3n8A7DMzPwAnCL4AxFPfoQihnCVMIzMEGUi1RmPDuPPeEyFRThS3AVwh5GcUEm4H7Z9M67UtfJQg8mJK_5PHFtsnOfu6RyjWmD69pwDS6sroO5-fMp-HhavuerdPPyvM7nm7QkAsLUMsyNwZBoTiwjjJoijiptK8GwKTJcZEJbFksYmUmBqyKaFhAhTAjFZAruxru9774OJnyrfXfwbXypMJdUIIEZitTDSJW-C8Ebq3rvGu0HhaA65aBOOajfHCKMRvjoajP8Q6q37XYx7vwAYYVdjQ</recordid><startdate>202207</startdate><enddate>202207</enddate><creator>Zhong, Xinyi</creator><creator>Su, Chang</creator><creator>Fan, Zhou</creator><general>Oxford University Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8BJ</scope><scope>8FD</scope><scope>FQK</scope><scope>JBE</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5940-4697</orcidid><orcidid>https://orcid.org/0000-0002-8704-1512</orcidid></search><sort><creationdate>202207</creationdate><title>Empirical Bayes PCA in high dimensions</title><author>Zhong, Xinyi ; Su, Chang ; Fan, Zhou</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3700-f526ee203a63f5354ebf52dafd752eb82b87af5f5f7e98972db989a7011233423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>AMP algorithms</topic><topic>Empirical analysis</topic><topic>empirical Bayes</topic><topic>Estimation</topic><topic>Gene expression</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Iterative methods</topic><topic>Matrix theory</topic><topic>Maximum likelihood estimators</topic><topic>Message passing</topic><topic>Noise</topic><topic>Principal components analysis</topic><topic>random matrix theory</topic><topic>Regression analysis</topic><topic>Simulation</topic><topic>Statistical methods</topic><topic>Statistics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhong, Xinyi</creatorcontrib><creatorcontrib>Su, Chang</creatorcontrib><creatorcontrib>Fan, Zhou</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>Technology Research Database</collection><collection>International Bibliography of the Social Sciences</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of the Royal Statistical Society. Series B, Statistical methodology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhong, Xinyi</au><au>Su, Chang</au><au>Fan, Zhou</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Empirical Bayes PCA in high dimensions</atitle><jtitle>Journal of the Royal Statistical Society. Series B, Statistical methodology</jtitle><date>2022-07</date><risdate>2022</risdate><volume>84</volume><issue>3</issue><spage>853</spage><epage>878</epage><pages>853-878</pages><issn>1369-7412</issn><eissn>1467-9868</eissn><abstract>When the dimension of data is comparable to or larger than the number of data samples, principal components analysis (PCA) may exhibit problematic high‐dimensional noise. In this work, we propose an empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the principal components. EB‐PCA is based on the classical Kiefer–Wolfowitz non‐parametric maximum likelihood estimator for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs and iterative refinement using an approximate message passing (AMP) algorithm. In theoretical ‘spiked’ models, EB‐PCA achieves Bayes‐optimal estimation accuracy in the same settings as an oracle Bayes AMP procedure that knows the true priors. Empirically, EB‐PCA significantly improves over PCA when there is strong prior structure, both in simulation and on quantitative benchmarks constructed from the 1000 Genomes Project and the International HapMap Project. An illustration is presented for analysis of gene expression data obtained by single‐cell RNA‐seq.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><doi>10.1111/rssb.12490</doi><tpages>26</tpages><orcidid>https://orcid.org/0000-0002-5940-4697</orcidid><orcidid>https://orcid.org/0000-0002-8704-1512</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1369-7412
ispartof Journal of the Royal Statistical Society. Series B, Statistical methodology, 2022-07, Vol.84 (3), p.853-878
issn 1369-7412
1467-9868
language eng
recordid cdi_proquest_journals_2694717251
source Oxford University Press Journals All Titles (1996-Current); Wiley Online Library Journals Frontfile Complete; EBSCOhost Business Source Complete
subjects Algorithms
AMP algorithms
Empirical analysis
empirical Bayes
Estimation
Gene expression
Genomes
Genomics
Iterative methods
Matrix theory
Maximum likelihood estimators
Message passing
Noise
Principal components analysis
random matrix theory
Regression analysis
Simulation
Statistical methods
Statistics
title Empirical Bayes PCA in high dimensions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-21T21%3A25%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Empirical%20Bayes%20PCA%20in%20high%20dimensions&rft.jtitle=Journal%20of%20the%20Royal%20Statistical%20Society.%20Series%20B,%20Statistical%20methodology&rft.au=Zhong,%20Xinyi&rft.date=2022-07&rft.volume=84&rft.issue=3&rft.spage=853&rft.epage=878&rft.pages=853-878&rft.issn=1369-7412&rft.eissn=1467-9868&rft_id=info:doi/10.1111/rssb.12490&rft_dat=%3Cproquest_cross%3E2694717251%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2694717251&rft_id=info:pmid/&rfr_iscdi=true