Spectral analysis of high-dimensional sample covariance matrices with missing observations

We study high-dimensional sample covariance matrices based on independent random vectors with missing coordinates. The presence of missing observations is common in modern applications such as climate studies or gene expression micro-arrays. A weak approximation on the spectral distribution in the &...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability 2017-11, Vol.23 (4A), p.2466-2532
Hauptverfasser: JURCZAK, KAMIL, ROHDE, ANGELIKA
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2532
container_issue 4A
container_start_page 2466
container_title Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability
container_volume 23
creator JURCZAK, KAMIL
ROHDE, ANGELIKA
description We study high-dimensional sample covariance matrices based on independent random vectors with missing coordinates. The presence of missing observations is common in modern applications such as climate studies or gene expression micro-arrays. A weak approximation on the spectral distribution in the "large dimension d and large sample size n" asymptotics is derived for possibly different observation probabilities in the coordinates. The spectral distribution turns out to be strongly influenced by the missingness mechanism. In the null case under the missing at random scenario where each component is observed with the same probability p, the limiting spectral distribution is a Marčenko–Pastur law shifted by (1 – p)/p to the left. As d/n → y ϵ (0, 1), the almost sure convergence of the extremal eigenvalues to the respective boundary points of the support of the limiting spectral distribution is proved, which are explicitly given in terms of y and p. Eventually, the sample covariance matrix is positive definite if p is larger than 1 – (1 – √y)2, whereas this is not true any longer if p is smaller than this quantity.
doi_str_mv 10.3150/16-BEJ815
format Article
fullrecord <record><control><sourceid>jstor_cross</sourceid><recordid>TN_cdi_crossref_primary_10_3150_16_BEJ815</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>26492031</jstor_id><sourcerecordid>26492031</sourcerecordid><originalsourceid>FETCH-LOGICAL-c286t-e256d2f6d6cb55853ebe6d6189f559c6e6f764f58fe1fa0b7c8904482fff22273</originalsourceid><addsrcrecordid>eNo9kD1PwzAYhD2ARCkM_AAkrwwB24nfOCNU5UuVGICFJXKc142rfFR-o6L-e4KCmE53eu6GY-xKittUanEnIXlYvxqpT9hCplokuQJ9xs6JdkLIDEAs2Nf7Ht0Ybcttb9sjBeKD503YNkkdOuwpDFPOyXb7FrkbDjYG2zvknR1jcEj8O4wN7wJR6Ld8qAjjwY5Tiy7Yqbct4eWfLtnn4_pj9Zxs3p5eVvebxCkDY4JKQ6081OAqrY1OscLJSFN4rQsHCD6HzGvjUXorqtyZQmSZUd57pVSeLtnNvOviQBTRl_sYOhuPpRTl7xGlhHI-YmKvZ3ZH4xD_QQVZoUQq0x_C2l4B</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Spectral analysis of high-dimensional sample covariance matrices with missing observations</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>JSTOR Mathematics &amp; Statistics</source><source>Jstor Complete Legacy</source><source>Project Euclid Complete</source><creator>JURCZAK, KAMIL ; ROHDE, ANGELIKA</creator><creatorcontrib>JURCZAK, KAMIL ; ROHDE, ANGELIKA</creatorcontrib><description>We study high-dimensional sample covariance matrices based on independent random vectors with missing coordinates. The presence of missing observations is common in modern applications such as climate studies or gene expression micro-arrays. A weak approximation on the spectral distribution in the "large dimension d and large sample size n" asymptotics is derived for possibly different observation probabilities in the coordinates. The spectral distribution turns out to be strongly influenced by the missingness mechanism. In the null case under the missing at random scenario where each component is observed with the same probability p, the limiting spectral distribution is a Marčenko–Pastur law shifted by (1 – p)/p to the left. As d/n → y ϵ (0, 1), the almost sure convergence of the extremal eigenvalues to the respective boundary points of the support of the limiting spectral distribution is proved, which are explicitly given in terms of y and p. Eventually, the sample covariance matrix is positive definite if p is larger than 1 – (1 – √y)2, whereas this is not true any longer if p is smaller than this quantity.</description><identifier>ISSN: 1350-7265</identifier><identifier>DOI: 10.3150/16-BEJ815</identifier><language>eng</language><publisher>INTERNATIONAL STATISTICAL INSTITUTE</publisher><ispartof>Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability, 2017-11, Vol.23 (4A), p.2466-2532</ispartof><rights>2017 International Statistical Institute/Bernoulli Society</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c286t-e256d2f6d6cb55853ebe6d6189f559c6e6f764f58fe1fa0b7c8904482fff22273</citedby><cites>FETCH-LOGICAL-c286t-e256d2f6d6cb55853ebe6d6189f559c6e6f764f58fe1fa0b7c8904482fff22273</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/26492031$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/26492031$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,778,782,801,830,27911,27912,58004,58008,58237,58241</link.rule.ids></links><search><creatorcontrib>JURCZAK, KAMIL</creatorcontrib><creatorcontrib>ROHDE, ANGELIKA</creatorcontrib><title>Spectral analysis of high-dimensional sample covariance matrices with missing observations</title><title>Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability</title><description>We study high-dimensional sample covariance matrices based on independent random vectors with missing coordinates. The presence of missing observations is common in modern applications such as climate studies or gene expression micro-arrays. A weak approximation on the spectral distribution in the "large dimension d and large sample size n" asymptotics is derived for possibly different observation probabilities in the coordinates. The spectral distribution turns out to be strongly influenced by the missingness mechanism. In the null case under the missing at random scenario where each component is observed with the same probability p, the limiting spectral distribution is a Marčenko–Pastur law shifted by (1 – p)/p to the left. As d/n → y ϵ (0, 1), the almost sure convergence of the extremal eigenvalues to the respective boundary points of the support of the limiting spectral distribution is proved, which are explicitly given in terms of y and p. Eventually, the sample covariance matrix is positive definite if p is larger than 1 – (1 – √y)2, whereas this is not true any longer if p is smaller than this quantity.</description><issn>1350-7265</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNo9kD1PwzAYhD2ARCkM_AAkrwwB24nfOCNU5UuVGICFJXKc142rfFR-o6L-e4KCmE53eu6GY-xKittUanEnIXlYvxqpT9hCplokuQJ9xs6JdkLIDEAs2Nf7Ht0Ybcttb9sjBeKD503YNkkdOuwpDFPOyXb7FrkbDjYG2zvknR1jcEj8O4wN7wJR6Ld8qAjjwY5Tiy7Yqbct4eWfLtnn4_pj9Zxs3p5eVvebxCkDY4JKQ6081OAqrY1OscLJSFN4rQsHCD6HzGvjUXorqtyZQmSZUd57pVSeLtnNvOviQBTRl_sYOhuPpRTl7xGlhHI-YmKvZ3ZH4xD_QQVZoUQq0x_C2l4B</recordid><startdate>20171101</startdate><enddate>20171101</enddate><creator>JURCZAK, KAMIL</creator><creator>ROHDE, ANGELIKA</creator><general>INTERNATIONAL STATISTICAL INSTITUTE</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20171101</creationdate><title>Spectral analysis of high-dimensional sample covariance matrices with missing observations</title><author>JURCZAK, KAMIL ; ROHDE, ANGELIKA</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c286t-e256d2f6d6cb55853ebe6d6189f559c6e6f764f58fe1fa0b7c8904482fff22273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>JURCZAK, KAMIL</creatorcontrib><creatorcontrib>ROHDE, ANGELIKA</creatorcontrib><collection>CrossRef</collection><jtitle>Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>JURCZAK, KAMIL</au><au>ROHDE, ANGELIKA</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Spectral analysis of high-dimensional sample covariance matrices with missing observations</atitle><jtitle>Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability</jtitle><date>2017-11-01</date><risdate>2017</risdate><volume>23</volume><issue>4A</issue><spage>2466</spage><epage>2532</epage><pages>2466-2532</pages><issn>1350-7265</issn><abstract>We study high-dimensional sample covariance matrices based on independent random vectors with missing coordinates. The presence of missing observations is common in modern applications such as climate studies or gene expression micro-arrays. A weak approximation on the spectral distribution in the "large dimension d and large sample size n" asymptotics is derived for possibly different observation probabilities in the coordinates. The spectral distribution turns out to be strongly influenced by the missingness mechanism. In the null case under the missing at random scenario where each component is observed with the same probability p, the limiting spectral distribution is a Marčenko–Pastur law shifted by (1 – p)/p to the left. As d/n → y ϵ (0, 1), the almost sure convergence of the extremal eigenvalues to the respective boundary points of the support of the limiting spectral distribution is proved, which are explicitly given in terms of y and p. Eventually, the sample covariance matrix is positive definite if p is larger than 1 – (1 – √y)2, whereas this is not true any longer if p is smaller than this quantity.</abstract><pub>INTERNATIONAL STATISTICAL INSTITUTE</pub><doi>10.3150/16-BEJ815</doi><tpages>67</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1350-7265
ispartof Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability, 2017-11, Vol.23 (4A), p.2466-2532
issn 1350-7265
language eng
recordid cdi_crossref_primary_10_3150_16_BEJ815
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; JSTOR Mathematics & Statistics; Jstor Complete Legacy; Project Euclid Complete
title Spectral analysis of high-dimensional sample covariance matrices with missing observations
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T23%3A03%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Spectral%20analysis%20of%20high-dimensional%20sample%20covariance%20matrices%20with%20missing%20observations&rft.jtitle=Bernoulli%20:%20official%20journal%20of%20the%20Bernoulli%20Society%20for%20Mathematical%20Statistics%20and%20Probability&rft.au=JURCZAK,%20KAMIL&rft.date=2017-11-01&rft.volume=23&rft.issue=4A&rft.spage=2466&rft.epage=2532&rft.pages=2466-2532&rft.issn=1350-7265&rft_id=info:doi/10.3150/16-BEJ815&rft_dat=%3Cjstor_cross%3E26492031%3C/jstor_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_jstor_id=26492031&rfr_iscdi=true