Estimating the construct validity of Principal Components Analysis

In many scientific disciplines, the features of interest cannot be observed directly, so must instead be inferred from observed behaviour. Latent variable analyses are increasingly employed to systematise these inferences, and Principal Components Analysis (PCA) is perhaps the simplest and most popu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-01
Hauptverfasser:	Hope, Thomas M H, Price, Cathy J, Halai, Ajay, Salvi, Carola, Crinion, Jenny, Keijsers, Merel, Sperber, Christoph, Bowman, Howard
Format:	Artikel
Sprache:	eng
Schlagworte:	Empirical analysis Noise measurement Orthogonality Principal components analysis Robustness Variance
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Hope, Thomas M H Price, Cathy J Halai, Ajay Salvi, Carola Crinion, Jenny Keijsers, Merel Sperber, Christoph Bowman, Howard
description	In many scientific disciplines, the features of interest cannot be observed directly, so must instead be inferred from observed behaviour. Latent variable analyses are increasingly employed to systematise these inferences, and Principal Components Analysis (PCA) is perhaps the simplest and most popular of these methods. Here, we examine how the assumptions that we are prepared to entertain, about the latent variable system, mediate the likelihood that PCA-derived components will capture the true sources of variance underlying data. As expected, we find that this likelihood is excellent in the best case, and robust to empirically reasonable levels of measurement noise, but best-case performance is also: (a) not robust to violations of the method's more prominent assumptions, of linearity and orthogonality; and also (b) requires that other subtler assumptions be made, such as that the latent variables should have varying importance, and that weights relating latent variables to observed data have zero mean. Neither variance explained, nor replication in independent samples, could reliably predict which (if any) PCA-derived components will capture true sources of variance in data. We conclude by describing a procedure to fit these inferences more directly to empirical data, and use it to find that components derived via PCA from two different empirical neuropsychological datasets, are less likely to have meaningful referents in the brain than we hoped.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2918027923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918027923</sourcerecordid><originalsourceid>FETCH-proquest_journals_29180279233</originalsourceid><addsrcrecordid>eNqNzrEOgjAUQNHGxESi_EMTZ5JSRGBUgnF0cCdNLfpIabHvYcLfy-AHON3lDHfFIpllaVIepNywGLEXQshjIfM8i9i5QYJBEbgnp5fh2jukMGniH2XhATRz3_FbAKdhVJbXfhi9M46Qn5yyMwLu2LpTFk3865btL829viZj8O_JILW9n8KCsZVVWgpZVMvQf-oLEH06fw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918027923</pqid></control><display><type>article</type><title>Estimating the construct validity of Principal Components Analysis</title><source>Free E- Journals</source><creator>Hope, Thomas M H ; Price, Cathy J ; Halai, Ajay ; Salvi, Carola ; Crinion, Jenny ; Keijsers, Merel ; Sperber, Christoph ; Bowman, Howard</creator><creatorcontrib>Hope, Thomas M H ; Price, Cathy J ; Halai, Ajay ; Salvi, Carola ; Crinion, Jenny ; Keijsers, Merel ; Sperber, Christoph ; Bowman, Howard</creatorcontrib><description>In many scientific disciplines, the features of interest cannot be observed directly, so must instead be inferred from observed behaviour. Latent variable analyses are increasingly employed to systematise these inferences, and Principal Components Analysis (PCA) is perhaps the simplest and most popular of these methods. Here, we examine how the assumptions that we are prepared to entertain, about the latent variable system, mediate the likelihood that PCA-derived components will capture the true sources of variance underlying data. As expected, we find that this likelihood is excellent in the best case, and robust to empirically reasonable levels of measurement noise, but best-case performance is also: (a) not robust to violations of the method's more prominent assumptions, of linearity and orthogonality; and also (b) requires that other subtler assumptions be made, such as that the latent variables should have varying importance, and that weights relating latent variables to observed data have zero mean. Neither variance explained, nor replication in independent samples, could reliably predict which (if any) PCA-derived components will capture true sources of variance in data. We conclude by describing a procedure to fit these inferences more directly to empirical data, and use it to find that components derived via PCA from two different empirical neuropsychological datasets, are less likely to have meaningful referents in the brain than we hoped.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Empirical analysis ; Noise measurement ; Orthogonality ; Principal components analysis ; Robustness ; Variance</subject><ispartof>arXiv.org, 2024-01</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Hope, Thomas M H</creatorcontrib><creatorcontrib>Price, Cathy J</creatorcontrib><creatorcontrib>Halai, Ajay</creatorcontrib><creatorcontrib>Salvi, Carola</creatorcontrib><creatorcontrib>Crinion, Jenny</creatorcontrib><creatorcontrib>Keijsers, Merel</creatorcontrib><creatorcontrib>Sperber, Christoph</creatorcontrib><creatorcontrib>Bowman, Howard</creatorcontrib><title>Estimating the construct validity of Principal Components Analysis</title><title>arXiv.org</title><description>In many scientific disciplines, the features of interest cannot be observed directly, so must instead be inferred from observed behaviour. Latent variable analyses are increasingly employed to systematise these inferences, and Principal Components Analysis (PCA) is perhaps the simplest and most popular of these methods. Here, we examine how the assumptions that we are prepared to entertain, about the latent variable system, mediate the likelihood that PCA-derived components will capture the true sources of variance underlying data. As expected, we find that this likelihood is excellent in the best case, and robust to empirically reasonable levels of measurement noise, but best-case performance is also: (a) not robust to violations of the method's more prominent assumptions, of linearity and orthogonality; and also (b) requires that other subtler assumptions be made, such as that the latent variables should have varying importance, and that weights relating latent variables to observed data have zero mean. Neither variance explained, nor replication in independent samples, could reliably predict which (if any) PCA-derived components will capture true sources of variance in data. We conclude by describing a procedure to fit these inferences more directly to empirical data, and use it to find that components derived via PCA from two different empirical neuropsychological datasets, are less likely to have meaningful referents in the brain than we hoped.</description><subject>Empirical analysis</subject><subject>Noise measurement</subject><subject>Orthogonality</subject><subject>Principal components analysis</subject><subject>Robustness</subject><subject>Variance</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNzrEOgjAUQNHGxESi_EMTZ5JSRGBUgnF0cCdNLfpIabHvYcLfy-AHON3lDHfFIpllaVIepNywGLEXQshjIfM8i9i5QYJBEbgnp5fh2jukMGniH2XhATRz3_FbAKdhVJbXfhi9M46Qn5yyMwLu2LpTFk3865btL829viZj8O_JILW9n8KCsZVVWgpZVMvQf-oLEH06fw</recordid><startdate>20240123</startdate><enddate>20240123</enddate><creator>Hope, Thomas M H</creator><creator>Price, Cathy J</creator><creator>Halai, Ajay</creator><creator>Salvi, Carola</creator><creator>Crinion, Jenny</creator><creator>Keijsers, Merel</creator><creator>Sperber, Christoph</creator><creator>Bowman, Howard</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240123</creationdate><title>Estimating the construct validity of Principal Components Analysis</title><author>Hope, Thomas M H ; Price, Cathy J ; Halai, Ajay ; Salvi, Carola ; Crinion, Jenny ; Keijsers, Merel ; Sperber, Christoph ; Bowman, Howard</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29180279233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Empirical analysis</topic><topic>Noise measurement</topic><topic>Orthogonality</topic><topic>Principal components analysis</topic><topic>Robustness</topic><topic>Variance</topic><toplevel>online_resources</toplevel><creatorcontrib>Hope, Thomas M H</creatorcontrib><creatorcontrib>Price, Cathy J</creatorcontrib><creatorcontrib>Halai, Ajay</creatorcontrib><creatorcontrib>Salvi, Carola</creatorcontrib><creatorcontrib>Crinion, Jenny</creatorcontrib><creatorcontrib>Keijsers, Merel</creatorcontrib><creatorcontrib>Sperber, Christoph</creatorcontrib><creatorcontrib>Bowman, Howard</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hope, Thomas M H</au><au>Price, Cathy J</au><au>Halai, Ajay</au><au>Salvi, Carola</au><au>Crinion, Jenny</au><au>Keijsers, Merel</au><au>Sperber, Christoph</au><au>Bowman, Howard</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Estimating the construct validity of Principal Components Analysis</atitle><jtitle>arXiv.org</jtitle><date>2024-01-23</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>In many scientific disciplines, the features of interest cannot be observed directly, so must instead be inferred from observed behaviour. Latent variable analyses are increasingly employed to systematise these inferences, and Principal Components Analysis (PCA) is perhaps the simplest and most popular of these methods. Here, we examine how the assumptions that we are prepared to entertain, about the latent variable system, mediate the likelihood that PCA-derived components will capture the true sources of variance underlying data. As expected, we find that this likelihood is excellent in the best case, and robust to empirically reasonable levels of measurement noise, but best-case performance is also: (a) not robust to violations of the method's more prominent assumptions, of linearity and orthogonality; and also (b) requires that other subtler assumptions be made, such as that the latent variables should have varying importance, and that weights relating latent variables to observed data have zero mean. Neither variance explained, nor replication in independent samples, could reliably predict which (if any) PCA-derived components will capture true sources of variance in data. We conclude by describing a procedure to fit these inferences more directly to empirical data, and use it to find that components derived via PCA from two different empirical neuropsychological datasets, are less likely to have meaningful referents in the brain than we hoped.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-01
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2918027923
source	Free E- Journals
subjects	Empirical analysis Noise measurement Orthogonality Principal components analysis Robustness Variance
title	Estimating the construct validity of Principal Components Analysis
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T01%3A43%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Estimating%20the%20construct%20validity%20of%20Principal%20Components%20Analysis&rft.jtitle=arXiv.org&rft.au=Hope,%20Thomas%20M%20H&rft.date=2024-01-23&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2918027923%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918027923&rft_id=info:pmid/&rfr_iscdi=true