Why Overfitting Is Not (Usually) a Problem in Partial Correlation Networks

Network psychometrics is undergoing a time of methodological reflection. In part, this was spurred by the revelation that ℓ1-regularization does not reduce spurious associations in partial correlation networks. In this work, we address another motivation for the widespread use of regularized estimat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Psychological methods 2022-10, Vol.27 (5), p.822-840
Hauptverfasser:	Williams, Donald R., Rodriguez, Josue E.
Format:	Artikel
Sprache:	eng
Schlagworte:	Error Analysis Estimation Female Human Inference Male Methodology Motivation Predictability (Measurement) Prediction Errors Statistical Correlation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	840
container_issue	5
container_start_page	822
container_title	Psychological methods
container_volume	27
creator	Williams, Donald R. Rodriguez, Josue E.
description	Network psychometrics is undergoing a time of methodological reflection. In part, this was spurred by the revelation that ℓ1-regularization does not reduce spurious associations in partial correlation networks. In this work, we address another motivation for the widespread use of regularized estimation: the thought that it is needed to mitigate overfitting. We first clarify important aspects of overfitting and the bias-variance tradeoff that are especially relevant for the network literature, where the number of nodes or items in a psychometric scale are not large compared to the number of observations (i.e., a low p/n ratio). This revealed that bias and especially variance are most problematic in p/n ratios rarely encountered. We then introduce a nonregularized method, based on classical hypothesis testing, that fulfills two desiderata: (a) reducing or controlling the false positives rate and (b) quelling concerns of overfitting by providing accurate predictions. These were the primary motivations for initially adopting the graphical lasso (glasso). In several simulation studies, our nonregularized method provided more than competitive predictive performance, and, in many cases, outperformed glasso. It appears to be nonregularized, as opposed to regularized estimation, that best satisfies these desiderata. We then provide insights into using our methodology. Here we discuss the multiple comparisons problem in relation to prediction: stringent alpha levels, resulting in a sparse network, can deteriorate predictive accuracy. We end by emphasizing key advantages of our approach that make it ideal for both inference and prediction in network analysis. Translational Abstract It is vital to clearly understand the benefits and limitations of regularized networks as inferences drawn from them may hold methodological and clinical implications. This article addresses a core rationale for the increasing adoption of regularized estimation. Namely, that it reduces overfitting. Accordingly, we elucidate important aspects of overfitting and the bias-variance tradeoff that are especially relevant for network research, where the number of variables is small relative to the number of observations (i.e., a low p/n ratio). We find that bias, and especially variance, are the most problematic aspects for inference in p/n ratios that are rare to psychometric settings. We then introduce a nonregularized method based on classical techniques that fulfill two desiderata: (1) reduci
doi_str_mv	10.1037/met0000437
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2651686313</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2651686313</sourcerecordid><originalsourceid>FETCH-LOGICAL-a387t-9962e7ef34e2b30bab6f793b671cd9e627341e183bce48db50eefa1ba41fe6663</originalsourceid><addsrcrecordid>eNpd0EFPFDEYxvGGaADRCx_ANPGCmtG2b6ftHMkGFUOAg0RuTTv7jgx2pkvb0ey3dyaLmNhLe_jlSfMn5JizD5yB_jhgYfORoPfIIW-gqbhU8Gx-MyOqxjS3B-RFzveMcQlG7pMDqKVgplaH5Ov3uy29-oWp60vpxx_0PNPLWOjJTZ5cCNu31NHrFH3AgfYjvXap9C7QVUwJgyt9HOkllt8x_cwvyfPOhYyvHu8jcvPp7NvqS3Vx9fl8dXpROTC6VE2jBGrsQKLwwLzzqtMNeKV5u25QCQ2SIzfgW5Rm7WuG2DnuneQdKqXgiJzsdjcpPkyYix363GIIbsQ4ZStUzZVRwGGmb_6j93FK4_y7RTEGta4X9W6n2hRzTtjZTeoHl7aWM7sUtv8Kz_j14-TkB1w_0b9JZ_B-B9zG2U3etkuyNmBup7nZWJYxK7StrREC_gCT34S4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2650035753</pqid></control><display><type>article</type><title>Why Overfitting Is Not (Usually) a Problem in Partial Correlation Networks</title><source>EBSCOhost APA PsycARTICLES</source><creator>Williams, Donald R. ; Rodriguez, Josue E.</creator><contributor>Steinley, Douglas</contributor><creatorcontrib>Williams, Donald R. ; Rodriguez, Josue E. ; Steinley, Douglas</creatorcontrib><description>Network psychometrics is undergoing a time of methodological reflection. In part, this was spurred by the revelation that ℓ1-regularization does not reduce spurious associations in partial correlation networks. In this work, we address another motivation for the widespread use of regularized estimation: the thought that it is needed to mitigate overfitting. We first clarify important aspects of overfitting and the bias-variance tradeoff that are especially relevant for the network literature, where the number of nodes or items in a psychometric scale are not large compared to the number of observations (i.e., a low p/n ratio). This revealed that bias and especially variance are most problematic in p/n ratios rarely encountered. We then introduce a nonregularized method, based on classical hypothesis testing, that fulfills two desiderata: (a) reducing or controlling the false positives rate and (b) quelling concerns of overfitting by providing accurate predictions. These were the primary motivations for initially adopting the graphical lasso (glasso). In several simulation studies, our nonregularized method provided more than competitive predictive performance, and, in many cases, outperformed glasso. It appears to be nonregularized, as opposed to regularized estimation, that best satisfies these desiderata. We then provide insights into using our methodology. Here we discuss the multiple comparisons problem in relation to prediction: stringent alpha levels, resulting in a sparse network, can deteriorate predictive accuracy. We end by emphasizing key advantages of our approach that make it ideal for both inference and prediction in network analysis. Translational Abstract It is vital to clearly understand the benefits and limitations of regularized networks as inferences drawn from them may hold methodological and clinical implications. This article addresses a core rationale for the increasing adoption of regularized estimation. Namely, that it reduces overfitting. Accordingly, we elucidate important aspects of overfitting and the bias-variance tradeoff that are especially relevant for network research, where the number of variables is small relative to the number of observations (i.e., a low p/n ratio). We find that bias, and especially variance, are the most problematic aspects for inference in p/n ratios that are rare to psychometric settings. We then introduce a nonregularized method based on classical techniques that fulfill two desiderata: (1) reducing or controlling chance findings and (2) avoiding overfitting by providing accurate predictions. In several simulation studies, our nonregularized method provided more than competitive predictive performance, and in many cases, outperformed regularized networks. It appears to be nonregularized, as opposed to regularized estimation, that best satisfies these desiderata. We then provide insights into using our methodology. Here we discuss the multiple comparisons problem in relation to prediction: stringent alpha levels, resulting in a network with few associations, can deteriorate predictive accuracy. We end by emphasizing key advantages of our approach that make it ideal for both inference and prediction in network analysis.</description><identifier>ISSN: 1082-989X</identifier><identifier>EISSN: 1939-1463</identifier><identifier>DOI: 10.1037/met0000437</identifier><identifier>PMID: 35420856</identifier><language>eng</language><publisher>United States: American Psychological Association</publisher><subject>Error Analysis ; Estimation ; Female ; Human ; Inference ; Male ; Methodology ; Motivation ; Predictability (Measurement) ; Prediction Errors ; Statistical Correlation</subject><ispartof>Psychological methods, 2022-10, Vol.27 (5), p.822-840</ispartof><rights>2022 American Psychological Association</rights><rights>2022, American Psychological Association</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a387t-9962e7ef34e2b30bab6f793b671cd9e627341e183bce48db50eefa1ba41fe6663</citedby><orcidid>0000-0001-6735-8785 ; 0000-0002-9092-4869</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35420856$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Steinley, Douglas</contributor><creatorcontrib>Williams, Donald R.</creatorcontrib><creatorcontrib>Rodriguez, Josue E.</creatorcontrib><title>Why Overfitting Is Not (Usually) a Problem in Partial Correlation Networks</title><title>Psychological methods</title><addtitle>Psychol Methods</addtitle><description>Network psychometrics is undergoing a time of methodological reflection. In part, this was spurred by the revelation that ℓ1-regularization does not reduce spurious associations in partial correlation networks. In this work, we address another motivation for the widespread use of regularized estimation: the thought that it is needed to mitigate overfitting. We first clarify important aspects of overfitting and the bias-variance tradeoff that are especially relevant for the network literature, where the number of nodes or items in a psychometric scale are not large compared to the number of observations (i.e., a low p/n ratio). This revealed that bias and especially variance are most problematic in p/n ratios rarely encountered. We then introduce a nonregularized method, based on classical hypothesis testing, that fulfills two desiderata: (a) reducing or controlling the false positives rate and (b) quelling concerns of overfitting by providing accurate predictions. These were the primary motivations for initially adopting the graphical lasso (glasso). In several simulation studies, our nonregularized method provided more than competitive predictive performance, and, in many cases, outperformed glasso. It appears to be nonregularized, as opposed to regularized estimation, that best satisfies these desiderata. We then provide insights into using our methodology. Here we discuss the multiple comparisons problem in relation to prediction: stringent alpha levels, resulting in a sparse network, can deteriorate predictive accuracy. We end by emphasizing key advantages of our approach that make it ideal for both inference and prediction in network analysis. Translational Abstract It is vital to clearly understand the benefits and limitations of regularized networks as inferences drawn from them may hold methodological and clinical implications. This article addresses a core rationale for the increasing adoption of regularized estimation. Namely, that it reduces overfitting. Accordingly, we elucidate important aspects of overfitting and the bias-variance tradeoff that are especially relevant for network research, where the number of variables is small relative to the number of observations (i.e., a low p/n ratio). We find that bias, and especially variance, are the most problematic aspects for inference in p/n ratios that are rare to psychometric settings. We then introduce a nonregularized method based on classical techniques that fulfill two desiderata: (1) reducing or controlling chance findings and (2) avoiding overfitting by providing accurate predictions. In several simulation studies, our nonregularized method provided more than competitive predictive performance, and in many cases, outperformed regularized networks. It appears to be nonregularized, as opposed to regularized estimation, that best satisfies these desiderata. We then provide insights into using our methodology. Here we discuss the multiple comparisons problem in relation to prediction: stringent alpha levels, resulting in a network with few associations, can deteriorate predictive accuracy. We end by emphasizing key advantages of our approach that make it ideal for both inference and prediction in network analysis.</description><subject>Error Analysis</subject><subject>Estimation</subject><subject>Female</subject><subject>Human</subject><subject>Inference</subject><subject>Male</subject><subject>Methodology</subject><subject>Motivation</subject><subject>Predictability (Measurement)</subject><subject>Prediction Errors</subject><subject>Statistical Correlation</subject><issn>1082-989X</issn><issn>1939-1463</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNpd0EFPFDEYxvGGaADRCx_ANPGCmtG2b6ftHMkGFUOAg0RuTTv7jgx2pkvb0ey3dyaLmNhLe_jlSfMn5JizD5yB_jhgYfORoPfIIW-gqbhU8Gx-MyOqxjS3B-RFzveMcQlG7pMDqKVgplaH5Ov3uy29-oWp60vpxx_0PNPLWOjJTZ5cCNu31NHrFH3AgfYjvXap9C7QVUwJgyt9HOkllt8x_cwvyfPOhYyvHu8jcvPp7NvqS3Vx9fl8dXpROTC6VE2jBGrsQKLwwLzzqtMNeKV5u25QCQ2SIzfgW5Rm7WuG2DnuneQdKqXgiJzsdjcpPkyYix363GIIbsQ4ZStUzZVRwGGmb_6j93FK4_y7RTEGta4X9W6n2hRzTtjZTeoHl7aWM7sUtv8Kz_j14-TkB1w_0b9JZ_B-B9zG2U3etkuyNmBup7nZWJYxK7StrREC_gCT34S4</recordid><startdate>20221001</startdate><enddate>20221001</enddate><creator>Williams, Donald R.</creator><creator>Rodriguez, Josue E.</creator><general>American Psychological Association</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7RZ</scope><scope>PSYQQ</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-6735-8785</orcidid><orcidid>https://orcid.org/0000-0002-9092-4869</orcidid></search><sort><creationdate>20221001</creationdate><title>Why Overfitting Is Not (Usually) a Problem in Partial Correlation Networks</title><author>Williams, Donald R. ; Rodriguez, Josue E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a387t-9962e7ef34e2b30bab6f793b671cd9e627341e183bce48db50eefa1ba41fe6663</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Error Analysis</topic><topic>Estimation</topic><topic>Female</topic><topic>Human</topic><topic>Inference</topic><topic>Male</topic><topic>Methodology</topic><topic>Motivation</topic><topic>Predictability (Measurement)</topic><topic>Prediction Errors</topic><topic>Statistical Correlation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Williams, Donald R.</creatorcontrib><creatorcontrib>Rodriguez, Josue E.</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>APA PsycArticles®</collection><collection>ProQuest One Psychology</collection><collection>MEDLINE - Academic</collection><jtitle>Psychological methods</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Williams, Donald R.</au><au>Rodriguez, Josue E.</au><au>Steinley, Douglas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Why Overfitting Is Not (Usually) a Problem in Partial Correlation Networks</atitle><jtitle>Psychological methods</jtitle><addtitle>Psychol Methods</addtitle><date>2022-10-01</date><risdate>2022</risdate><volume>27</volume><issue>5</issue><spage>822</spage><epage>840</epage><pages>822-840</pages><issn>1082-989X</issn><eissn>1939-1463</eissn><abstract>Network psychometrics is undergoing a time of methodological reflection. In part, this was spurred by the revelation that ℓ1-regularization does not reduce spurious associations in partial correlation networks. In this work, we address another motivation for the widespread use of regularized estimation: the thought that it is needed to mitigate overfitting. We first clarify important aspects of overfitting and the bias-variance tradeoff that are especially relevant for the network literature, where the number of nodes or items in a psychometric scale are not large compared to the number of observations (i.e., a low p/n ratio). This revealed that bias and especially variance are most problematic in p/n ratios rarely encountered. We then introduce a nonregularized method, based on classical hypothesis testing, that fulfills two desiderata: (a) reducing or controlling the false positives rate and (b) quelling concerns of overfitting by providing accurate predictions. These were the primary motivations for initially adopting the graphical lasso (glasso). In several simulation studies, our nonregularized method provided more than competitive predictive performance, and, in many cases, outperformed glasso. It appears to be nonregularized, as opposed to regularized estimation, that best satisfies these desiderata. We then provide insights into using our methodology. Here we discuss the multiple comparisons problem in relation to prediction: stringent alpha levels, resulting in a sparse network, can deteriorate predictive accuracy. We end by emphasizing key advantages of our approach that make it ideal for both inference and prediction in network analysis. Translational Abstract It is vital to clearly understand the benefits and limitations of regularized networks as inferences drawn from them may hold methodological and clinical implications. This article addresses a core rationale for the increasing adoption of regularized estimation. Namely, that it reduces overfitting. Accordingly, we elucidate important aspects of overfitting and the bias-variance tradeoff that are especially relevant for network research, where the number of variables is small relative to the number of observations (i.e., a low p/n ratio). We find that bias, and especially variance, are the most problematic aspects for inference in p/n ratios that are rare to psychometric settings. We then introduce a nonregularized method based on classical techniques that fulfill two desiderata: (1) reducing or controlling chance findings and (2) avoiding overfitting by providing accurate predictions. In several simulation studies, our nonregularized method provided more than competitive predictive performance, and in many cases, outperformed regularized networks. It appears to be nonregularized, as opposed to regularized estimation, that best satisfies these desiderata. We then provide insights into using our methodology. Here we discuss the multiple comparisons problem in relation to prediction: stringent alpha levels, resulting in a network with few associations, can deteriorate predictive accuracy. We end by emphasizing key advantages of our approach that make it ideal for both inference and prediction in network analysis.</abstract><cop>United States</cop><pub>American Psychological Association</pub><pmid>35420856</pmid><doi>10.1037/met0000437</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0001-6735-8785</orcidid><orcidid>https://orcid.org/0000-0002-9092-4869</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1082-989X
ispartof	Psychological methods, 2022-10, Vol.27 (5), p.822-840
issn	1082-989X 1939-1463
language	eng
recordid	cdi_proquest_miscellaneous_2651686313
source	EBSCOhost APA PsycARTICLES
subjects	Error Analysis Estimation Female Human Inference Male Methodology Motivation Predictability (Measurement) Prediction Errors Statistical Correlation
title	Why Overfitting Is Not (Usually) a Problem in Partial Correlation Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T06%3A07%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Why%20Overfitting%20Is%20Not%20(Usually)%20a%20Problem%20in%20Partial%20Correlation%20Networks&rft.jtitle=Psychological%20methods&rft.au=Williams,%20Donald%20R.&rft.date=2022-10-01&rft.volume=27&rft.issue=5&rft.spage=822&rft.epage=840&rft.pages=822-840&rft.issn=1082-989X&rft.eissn=1939-1463&rft_id=info:doi/10.1037/met0000437&rft_dat=%3Cproquest_cross%3E2651686313%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2650035753&rft_id=info:pmid/35420856&rfr_iscdi=true