The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions

In high-dimensional sparse regression, would increasing the signal-to-noise ratio while fixing the sparsity level always lead to better model selection? For high-dimensional sparse regression problems, surprisingly, in this paper we answer this question in the negative in the regime of linear sparsi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on information theory 2022-08, Vol.68 (8), p.5268-5294
Hauptverfasser: Wang, Hua, Yang, Yachong, Su, Weijie J.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 5294
container_issue 8
container_start_page 5268
container_title IEEE transactions on information theory
container_volume 68
creator Wang, Hua
Yang, Yachong
Su, Weijie J.
description In high-dimensional sparse regression, would increasing the signal-to-noise ratio while fixing the sparsity level always lead to better model selection? For high-dimensional sparse regression problems, surprisingly, in this paper we answer this question in the negative in the regime of linear sparsity for the Lasso method, relying on a new concept we term effect size heterogeneity . Roughly speaking, a regression coefficient vector has high effect size heterogeneity if its nonzero entries have significantly different magnitudes. From the viewpoint of this new measure, we prove that the false and true positive rates achieve the optimal trade-off uniformly along the Lasso path when this measure is maximal in a certain sense, and the worst trade-off is achieved when it is minimal in the sense that all nonzero effect sizes are roughly equal. Moreover, we demonstrate that the first false selection occurs much earlier when effect size heterogeneity is minimal than when it is maximal. The underlying cause of these two phenomena is, metaphorically speaking, the "competition" among variables with effect sizes of the same magnitude in entering the model. Taken together, our findings suggest that effect size heterogeneity shall serve as an important complementary measure to the sparsity of regression coefficients in the analysis of high-dimensional regression problems. Our proofs use techniques from approximate message passing theory as well as a novel technique for estimating the rank of the first false variable.
doi_str_mv 10.1109/TIT.2022.3166720
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2688696875</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9756650</ieee_id><sourcerecordid>2688696875</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-220f3adee5b30291aed010c3d0b920e8ed19f2d3967f0ee70cdba82c66540cbe3</originalsourceid><addsrcrecordid>eNo9kEtLAzEUhYMoWKt7wU3A9dSbZCYzcSe12kJ9gOM6TDM3bYqdqUm6qL_elIqry4Hz4H6EXDMYMQbqrp7VIw6cjwSTsuRwQgasKMpMySI_JQMAVmUqz6tzchHCOsm8YHxAXusV0nfvDNLe0nG_2WJ00fXdPZ1YiybSD_eDdIoRfb_EDl3c05cmJhmo6-jULVf00W2wCykULsmZbb4CXv3dIfl8mtTjaTZ_e56NH-aZEULEjHOwomkRi4UArliDLTAwooWF4oAVtkxZ3golSwuIJZh20VTcyPQMmAWKIbk99m59_73DEPW63_kuTWouq0oqWZVFcsHRZXwfgkert95tGr_XDPSBmk7U9IGa_qOWIjfHiEPEf7sqizQN4hcLq2hA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2688696875</pqid></control><display><type>article</type><title>The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions</title><source>IEEE Electronic Library (IEL)</source><creator>Wang, Hua ; Yang, Yachong ; Su, Weijie J.</creator><creatorcontrib>Wang, Hua ; Yang, Yachong ; Su, Weijie J.</creatorcontrib><description>In high-dimensional sparse regression, would increasing the signal-to-noise ratio while fixing the sparsity level always lead to better model selection? For high-dimensional sparse regression problems, surprisingly, in this paper we answer this question in the negative in the regime of linear sparsity for the Lasso method, relying on a new concept we term effect size heterogeneity . Roughly speaking, a regression coefficient vector has high effect size heterogeneity if its nonzero entries have significantly different magnitudes. From the viewpoint of this new measure, we prove that the false and true positive rates achieve the optimal trade-off uniformly along the Lasso path when this measure is maximal in a certain sense, and the worst trade-off is achieved when it is minimal in the sense that all nonzero effect sizes are roughly equal. Moreover, we demonstrate that the first false selection occurs much earlier when effect size heterogeneity is minimal than when it is maximal. The underlying cause of these two phenomena is, metaphorically speaking, the "competition" among variables with effect sizes of the same magnitude in entering the model. Taken together, our findings suggest that effect size heterogeneity shall serve as an important complementary measure to the sparsity of regression coefficients in the analysis of high-dimensional regression problems. Our proofs use techniques from approximate message passing theory as well as a novel technique for estimating the rank of the first false variable.</description><identifier>ISSN: 0018-9448</identifier><identifier>EISSN: 1557-9654</identifier><identifier>DOI: 10.1109/TIT.2022.3166720</identifier><identifier>CODEN: IETTAW</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Approximate message passing ; Competition ; Data models ; Data science ; Dimensional analysis ; false discovery rate ; Heterogeneity ; high-dimensional sparse regression ; Indexes ; Input variables ; Message passing ; model selection ; Noise level ; Regression coefficients ; Signal to noise ratio ; Sparsity ; Tradeoffs ; Upper bound</subject><ispartof>IEEE transactions on information theory, 2022-08, Vol.68 (8), p.5268-5294</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-220f3adee5b30291aed010c3d0b920e8ed19f2d3967f0ee70cdba82c66540cbe3</citedby><cites>FETCH-LOGICAL-c333t-220f3adee5b30291aed010c3d0b920e8ed19f2d3967f0ee70cdba82c66540cbe3</cites><orcidid>0000-0003-1787-1219 ; 0000-0001-9780-4918 ; 0000-0002-7268-4152</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9756650$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9756650$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wang, Hua</creatorcontrib><creatorcontrib>Yang, Yachong</creatorcontrib><creatorcontrib>Su, Weijie J.</creatorcontrib><title>The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions</title><title>IEEE transactions on information theory</title><addtitle>TIT</addtitle><description>In high-dimensional sparse regression, would increasing the signal-to-noise ratio while fixing the sparsity level always lead to better model selection? For high-dimensional sparse regression problems, surprisingly, in this paper we answer this question in the negative in the regime of linear sparsity for the Lasso method, relying on a new concept we term effect size heterogeneity . Roughly speaking, a regression coefficient vector has high effect size heterogeneity if its nonzero entries have significantly different magnitudes. From the viewpoint of this new measure, we prove that the false and true positive rates achieve the optimal trade-off uniformly along the Lasso path when this measure is maximal in a certain sense, and the worst trade-off is achieved when it is minimal in the sense that all nonzero effect sizes are roughly equal. Moreover, we demonstrate that the first false selection occurs much earlier when effect size heterogeneity is minimal than when it is maximal. The underlying cause of these two phenomena is, metaphorically speaking, the "competition" among variables with effect sizes of the same magnitude in entering the model. Taken together, our findings suggest that effect size heterogeneity shall serve as an important complementary measure to the sparsity of regression coefficients in the analysis of high-dimensional regression problems. Our proofs use techniques from approximate message passing theory as well as a novel technique for estimating the rank of the first false variable.</description><subject>Approximate message passing</subject><subject>Competition</subject><subject>Data models</subject><subject>Data science</subject><subject>Dimensional analysis</subject><subject>false discovery rate</subject><subject>Heterogeneity</subject><subject>high-dimensional sparse regression</subject><subject>Indexes</subject><subject>Input variables</subject><subject>Message passing</subject><subject>model selection</subject><subject>Noise level</subject><subject>Regression coefficients</subject><subject>Signal to noise ratio</subject><subject>Sparsity</subject><subject>Tradeoffs</subject><subject>Upper bound</subject><issn>0018-9448</issn><issn>1557-9654</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEtLAzEUhYMoWKt7wU3A9dSbZCYzcSe12kJ9gOM6TDM3bYqdqUm6qL_elIqry4Hz4H6EXDMYMQbqrp7VIw6cjwSTsuRwQgasKMpMySI_JQMAVmUqz6tzchHCOsm8YHxAXusV0nfvDNLe0nG_2WJ00fXdPZ1YiybSD_eDdIoRfb_EDl3c05cmJhmo6-jULVf00W2wCykULsmZbb4CXv3dIfl8mtTjaTZ_e56NH-aZEULEjHOwomkRi4UArliDLTAwooWF4oAVtkxZ3golSwuIJZh20VTcyPQMmAWKIbk99m59_73DEPW63_kuTWouq0oqWZVFcsHRZXwfgkert95tGr_XDPSBmk7U9IGa_qOWIjfHiEPEf7sqizQN4hcLq2hA</recordid><startdate>20220801</startdate><enddate>20220801</enddate><creator>Wang, Hua</creator><creator>Yang, Yachong</creator><creator>Su, Weijie J.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1787-1219</orcidid><orcidid>https://orcid.org/0000-0001-9780-4918</orcidid><orcidid>https://orcid.org/0000-0002-7268-4152</orcidid></search><sort><creationdate>20220801</creationdate><title>The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions</title><author>Wang, Hua ; Yang, Yachong ; Su, Weijie J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-220f3adee5b30291aed010c3d0b920e8ed19f2d3967f0ee70cdba82c66540cbe3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Approximate message passing</topic><topic>Competition</topic><topic>Data models</topic><topic>Data science</topic><topic>Dimensional analysis</topic><topic>false discovery rate</topic><topic>Heterogeneity</topic><topic>high-dimensional sparse regression</topic><topic>Indexes</topic><topic>Input variables</topic><topic>Message passing</topic><topic>model selection</topic><topic>Noise level</topic><topic>Regression coefficients</topic><topic>Signal to noise ratio</topic><topic>Sparsity</topic><topic>Tradeoffs</topic><topic>Upper bound</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Hua</creatorcontrib><creatorcontrib>Yang, Yachong</creatorcontrib><creatorcontrib>Su, Weijie J.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information theory</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Hua</au><au>Yang, Yachong</au><au>Su, Weijie J.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions</atitle><jtitle>IEEE transactions on information theory</jtitle><stitle>TIT</stitle><date>2022-08-01</date><risdate>2022</risdate><volume>68</volume><issue>8</issue><spage>5268</spage><epage>5294</epage><pages>5268-5294</pages><issn>0018-9448</issn><eissn>1557-9654</eissn><coden>IETTAW</coden><abstract>In high-dimensional sparse regression, would increasing the signal-to-noise ratio while fixing the sparsity level always lead to better model selection? For high-dimensional sparse regression problems, surprisingly, in this paper we answer this question in the negative in the regime of linear sparsity for the Lasso method, relying on a new concept we term effect size heterogeneity . Roughly speaking, a regression coefficient vector has high effect size heterogeneity if its nonzero entries have significantly different magnitudes. From the viewpoint of this new measure, we prove that the false and true positive rates achieve the optimal trade-off uniformly along the Lasso path when this measure is maximal in a certain sense, and the worst trade-off is achieved when it is minimal in the sense that all nonzero effect sizes are roughly equal. Moreover, we demonstrate that the first false selection occurs much earlier when effect size heterogeneity is minimal than when it is maximal. The underlying cause of these two phenomena is, metaphorically speaking, the "competition" among variables with effect sizes of the same magnitude in entering the model. Taken together, our findings suggest that effect size heterogeneity shall serve as an important complementary measure to the sparsity of regression coefficients in the analysis of high-dimensional regression problems. Our proofs use techniques from approximate message passing theory as well as a novel technique for estimating the rank of the first false variable.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIT.2022.3166720</doi><tpages>27</tpages><orcidid>https://orcid.org/0000-0003-1787-1219</orcidid><orcidid>https://orcid.org/0000-0001-9780-4918</orcidid><orcidid>https://orcid.org/0000-0002-7268-4152</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0018-9448
ispartof IEEE transactions on information theory, 2022-08, Vol.68 (8), p.5268-5294
issn 0018-9448
1557-9654
language eng
recordid cdi_proquest_journals_2688696875
source IEEE Electronic Library (IEL)
subjects Approximate message passing
Competition
Data models
Data science
Dimensional analysis
false discovery rate
Heterogeneity
high-dimensional sparse regression
Indexes
Input variables
Message passing
model selection
Noise level
Regression coefficients
Signal to noise ratio
Sparsity
Tradeoffs
Upper bound
title The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T10%3A04%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20Price%20of%20Competition:%20Effect%20Size%20Heterogeneity%20Matters%20in%20High%20Dimensions&rft.jtitle=IEEE%20transactions%20on%20information%20theory&rft.au=Wang,%20Hua&rft.date=2022-08-01&rft.volume=68&rft.issue=8&rft.spage=5268&rft.epage=5294&rft.pages=5268-5294&rft.issn=0018-9448&rft.eissn=1557-9654&rft.coden=IETTAW&rft_id=info:doi/10.1109/TIT.2022.3166720&rft_dat=%3Cproquest_RIE%3E2688696875%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2688696875&rft_id=info:pmid/&rft_ieee_id=9756650&rfr_iscdi=true