Variable Selection and the Interpretation of Principal Subspaces

Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of agricultural, biological, and environmental statistics biological, and environmental statistics, 2001-03, Vol.6 (1), p.62-79
Hauptverfasser: Jorge F. C. L. Cadima, Jolliffe, Ian T.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 79
container_issue 1
container_start_page 62
container_title Journal of agricultural, biological, and environmental statistics
container_volume 6
creator Jorge F. C. L. Cadima
Jolliffe, Ian T.
description Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.
doi_str_mv 10.1198/108571101300325256
format Article
fullrecord <record><control><sourceid>jstor_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1198_108571101300325256</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>1400354</jstor_id><sourcerecordid>1400354</sourcerecordid><originalsourceid>FETCH-LOGICAL-c360t-2be7e3d2e1a44e4b35d982e3b1d278e0e00ad192b332decef75db1edf69d35ea3</originalsourceid><addsrcrecordid>eNplkEtLxEAQhAdRcF39A-Ih4Dk6PY88bsriY2FBYdVr6JnpwSwxCTPx4L93NKIHT900X1UXxdgp8AuAuroEXukSgIPkXAotdLHHFqBlmYuilvtpT0CeiPKQHcW444ksuFiwqxcMLZqOsi11ZKd26DPsXTa9UrbuJwpjoAm_z4PPHkPb23bELtu-mziipXjMDjx2kU5-5pI93948re7zzcPdenW9yW16NOXCUEnSCQJUipSR2tWVIGnAibIiTpyjg1oYKYUjS77UzgA5X9ROakK5ZGL2tWGIMZBvxtC-YfhogDdfHTT_O0ii81k0YrTY-YApfvxV1qrihUjU2Uzt4jSEP1-VfLSSn3C8ZRo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Variable Selection and the Interpretation of Principal Subspaces</title><source>Jstor Complete Legacy</source><source>SpringerLink Journals</source><source>JSTOR Mathematics &amp; Statistics</source><creator>Jorge F. C. L. Cadima ; Jolliffe, Ian T.</creator><creatorcontrib>Jorge F. C. L. Cadima ; Jolliffe, Ian T.</creatorcontrib><description>Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.</description><identifier>ISSN: 1085-7117</identifier><identifier>EISSN: 1537-2693</identifier><identifier>DOI: 10.1198/108571101300325256</identifier><language>eng</language><publisher>Washington, DC: American Statistical Association and the International Biometric Society</publisher><subject>Animal, plant and microbial ecology ; Applied statistics ; Approximation ; Biological and medical sciences ; Cardinality ; Correlations ; Covariance matrices ; Crayfish ; Datasets ; Farm economics ; Fundamental and applied biological sciences. Psychology ; General aspects. Techniques ; Methods and techniques (sampling, tagging, trapping, modelling...) ; Principal components analysis ; Statistical variance</subject><ispartof>Journal of agricultural, biological, and environmental statistics, 2001-03, Vol.6 (1), p.62-79</ispartof><rights>Copyright 2001 American Statistical Association and the International Biometric Society</rights><rights>2001 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c360t-2be7e3d2e1a44e4b35d982e3b1d278e0e00ad192b332decef75db1edf69d35ea3</citedby><cites>FETCH-LOGICAL-c360t-2be7e3d2e1a44e4b35d982e3b1d278e0e00ad192b332decef75db1edf69d35ea3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/1400354$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/1400354$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,776,780,799,828,27901,27902,57992,57996,58225,58229</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=948062$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Jorge F. C. L. Cadima</creatorcontrib><creatorcontrib>Jolliffe, Ian T.</creatorcontrib><title>Variable Selection and the Interpretation of Principal Subspaces</title><title>Journal of agricultural, biological, and environmental statistics</title><description>Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.</description><subject>Animal, plant and microbial ecology</subject><subject>Applied statistics</subject><subject>Approximation</subject><subject>Biological and medical sciences</subject><subject>Cardinality</subject><subject>Correlations</subject><subject>Covariance matrices</subject><subject>Crayfish</subject><subject>Datasets</subject><subject>Farm economics</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>General aspects. Techniques</subject><subject>Methods and techniques (sampling, tagging, trapping, modelling...)</subject><subject>Principal components analysis</subject><subject>Statistical variance</subject><issn>1085-7117</issn><issn>1537-2693</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2001</creationdate><recordtype>article</recordtype><recordid>eNplkEtLxEAQhAdRcF39A-Ih4Dk6PY88bsriY2FBYdVr6JnpwSwxCTPx4L93NKIHT900X1UXxdgp8AuAuroEXukSgIPkXAotdLHHFqBlmYuilvtpT0CeiPKQHcW444ksuFiwqxcMLZqOsi11ZKd26DPsXTa9UrbuJwpjoAm_z4PPHkPb23bELtu-mziipXjMDjx2kU5-5pI93948re7zzcPdenW9yW16NOXCUEnSCQJUipSR2tWVIGnAibIiTpyjg1oYKYUjS77UzgA5X9ROakK5ZGL2tWGIMZBvxtC-YfhogDdfHTT_O0ii81k0YrTY-YApfvxV1qrihUjU2Uzt4jSEP1-VfLSSn3C8ZRo</recordid><startdate>20010301</startdate><enddate>20010301</enddate><creator>Jorge F. C. L. Cadima</creator><creator>Jolliffe, Ian T.</creator><general>American Statistical Association and the International Biometric Society</general><general>American Statistical Association</general><general>International Biometric Society</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20010301</creationdate><title>Variable Selection and the Interpretation of Principal Subspaces</title><author>Jorge F. C. L. Cadima ; Jolliffe, Ian T.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c360t-2be7e3d2e1a44e4b35d982e3b1d278e0e00ad192b332decef75db1edf69d35ea3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Animal, plant and microbial ecology</topic><topic>Applied statistics</topic><topic>Approximation</topic><topic>Biological and medical sciences</topic><topic>Cardinality</topic><topic>Correlations</topic><topic>Covariance matrices</topic><topic>Crayfish</topic><topic>Datasets</topic><topic>Farm economics</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>General aspects. Techniques</topic><topic>Methods and techniques (sampling, tagging, trapping, modelling...)</topic><topic>Principal components analysis</topic><topic>Statistical variance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jorge F. C. L. Cadima</creatorcontrib><creatorcontrib>Jolliffe, Ian T.</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><jtitle>Journal of agricultural, biological, and environmental statistics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jorge F. C. L. Cadima</au><au>Jolliffe, Ian T.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Variable Selection and the Interpretation of Principal Subspaces</atitle><jtitle>Journal of agricultural, biological, and environmental statistics</jtitle><date>2001-03-01</date><risdate>2001</risdate><volume>6</volume><issue>1</issue><spage>62</spage><epage>79</epage><pages>62-79</pages><issn>1085-7117</issn><eissn>1537-2693</eissn><abstract>Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.</abstract><cop>Washington, DC</cop><pub>American Statistical Association and the International Biometric Society</pub><doi>10.1198/108571101300325256</doi><tpages>18</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1085-7117
ispartof Journal of agricultural, biological, and environmental statistics, 2001-03, Vol.6 (1), p.62-79
issn 1085-7117
1537-2693
language eng
recordid cdi_crossref_primary_10_1198_108571101300325256
source Jstor Complete Legacy; SpringerLink Journals; JSTOR Mathematics & Statistics
subjects Animal, plant and microbial ecology
Applied statistics
Approximation
Biological and medical sciences
Cardinality
Correlations
Covariance matrices
Crayfish
Datasets
Farm economics
Fundamental and applied biological sciences. Psychology
General aspects. Techniques
Methods and techniques (sampling, tagging, trapping, modelling...)
Principal components analysis
Statistical variance
title Variable Selection and the Interpretation of Principal Subspaces
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T17%3A58%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Variable%20Selection%20and%20the%20Interpretation%20of%20Principal%20Subspaces&rft.jtitle=Journal%20of%20agricultural,%20biological,%20and%20environmental%20statistics&rft.au=Jorge%20F.%20C.%20L.%20Cadima&rft.date=2001-03-01&rft.volume=6&rft.issue=1&rft.spage=62&rft.epage=79&rft.pages=62-79&rft.issn=1085-7117&rft.eissn=1537-2693&rft_id=info:doi/10.1198/108571101300325256&rft_dat=%3Cjstor_cross%3E1400354%3C/jstor_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_jstor_id=1400354&rfr_iscdi=true