Variable Selection and the Interpretation of Principal Subspaces
Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables....
Gespeichert in:
Veröffentlicht in: | Journal of agricultural, biological, and environmental statistics biological, and environmental statistics, 2001-03, Vol.6 (1), p.62-79 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 79 |
---|---|
container_issue | 1 |
container_start_page | 62 |
container_title | Journal of agricultural, biological, and environmental statistics |
container_volume | 6 |
creator | Jorge F. C. L. Cadima Jolliffe, Ian T. |
description | Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables. |
doi_str_mv | 10.1198/108571101300325256 |
format | Article |
fullrecord | <record><control><sourceid>jstor_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1198_108571101300325256</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>1400354</jstor_id><sourcerecordid>1400354</sourcerecordid><originalsourceid>FETCH-LOGICAL-c360t-2be7e3d2e1a44e4b35d982e3b1d278e0e00ad192b332decef75db1edf69d35ea3</originalsourceid><addsrcrecordid>eNplkEtLxEAQhAdRcF39A-Ih4Dk6PY88bsriY2FBYdVr6JnpwSwxCTPx4L93NKIHT900X1UXxdgp8AuAuroEXukSgIPkXAotdLHHFqBlmYuilvtpT0CeiPKQHcW444ksuFiwqxcMLZqOsi11ZKd26DPsXTa9UrbuJwpjoAm_z4PPHkPb23bELtu-mziipXjMDjx2kU5-5pI93948re7zzcPdenW9yW16NOXCUEnSCQJUipSR2tWVIGnAibIiTpyjg1oYKYUjS77UzgA5X9ROakK5ZGL2tWGIMZBvxtC-YfhogDdfHTT_O0ii81k0YrTY-YApfvxV1qrihUjU2Uzt4jSEP1-VfLSSn3C8ZRo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Variable Selection and the Interpretation of Principal Subspaces</title><source>Jstor Complete Legacy</source><source>SpringerLink Journals</source><source>JSTOR Mathematics & Statistics</source><creator>Jorge F. C. L. Cadima ; Jolliffe, Ian T.</creator><creatorcontrib>Jorge F. C. L. Cadima ; Jolliffe, Ian T.</creatorcontrib><description>Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.</description><identifier>ISSN: 1085-7117</identifier><identifier>EISSN: 1537-2693</identifier><identifier>DOI: 10.1198/108571101300325256</identifier><language>eng</language><publisher>Washington, DC: American Statistical Association and the International Biometric Society</publisher><subject>Animal, plant and microbial ecology ; Applied statistics ; Approximation ; Biological and medical sciences ; Cardinality ; Correlations ; Covariance matrices ; Crayfish ; Datasets ; Farm economics ; Fundamental and applied biological sciences. Psychology ; General aspects. Techniques ; Methods and techniques (sampling, tagging, trapping, modelling...) ; Principal components analysis ; Statistical variance</subject><ispartof>Journal of agricultural, biological, and environmental statistics, 2001-03, Vol.6 (1), p.62-79</ispartof><rights>Copyright 2001 American Statistical Association and the International Biometric Society</rights><rights>2001 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c360t-2be7e3d2e1a44e4b35d982e3b1d278e0e00ad192b332decef75db1edf69d35ea3</citedby><cites>FETCH-LOGICAL-c360t-2be7e3d2e1a44e4b35d982e3b1d278e0e00ad192b332decef75db1edf69d35ea3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/1400354$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/1400354$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,776,780,799,828,27901,27902,57992,57996,58225,58229</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=948062$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Jorge F. C. L. Cadima</creatorcontrib><creatorcontrib>Jolliffe, Ian T.</creatorcontrib><title>Variable Selection and the Interpretation of Principal Subspaces</title><title>Journal of agricultural, biological, and environmental statistics</title><description>Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.</description><subject>Animal, plant and microbial ecology</subject><subject>Applied statistics</subject><subject>Approximation</subject><subject>Biological and medical sciences</subject><subject>Cardinality</subject><subject>Correlations</subject><subject>Covariance matrices</subject><subject>Crayfish</subject><subject>Datasets</subject><subject>Farm economics</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>General aspects. Techniques</subject><subject>Methods and techniques (sampling, tagging, trapping, modelling...)</subject><subject>Principal components analysis</subject><subject>Statistical variance</subject><issn>1085-7117</issn><issn>1537-2693</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2001</creationdate><recordtype>article</recordtype><recordid>eNplkEtLxEAQhAdRcF39A-Ih4Dk6PY88bsriY2FBYdVr6JnpwSwxCTPx4L93NKIHT900X1UXxdgp8AuAuroEXukSgIPkXAotdLHHFqBlmYuilvtpT0CeiPKQHcW444ksuFiwqxcMLZqOsi11ZKd26DPsXTa9UrbuJwpjoAm_z4PPHkPb23bELtu-mziipXjMDjx2kU5-5pI93948re7zzcPdenW9yW16NOXCUEnSCQJUipSR2tWVIGnAibIiTpyjg1oYKYUjS77UzgA5X9ROakK5ZGL2tWGIMZBvxtC-YfhogDdfHTT_O0ii81k0YrTY-YApfvxV1qrihUjU2Uzt4jSEP1-VfLSSn3C8ZRo</recordid><startdate>20010301</startdate><enddate>20010301</enddate><creator>Jorge F. C. L. Cadima</creator><creator>Jolliffe, Ian T.</creator><general>American Statistical Association and the International Biometric Society</general><general>American Statistical Association</general><general>International Biometric Society</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20010301</creationdate><title>Variable Selection and the Interpretation of Principal Subspaces</title><author>Jorge F. C. L. Cadima ; Jolliffe, Ian T.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c360t-2be7e3d2e1a44e4b35d982e3b1d278e0e00ad192b332decef75db1edf69d35ea3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Animal, plant and microbial ecology</topic><topic>Applied statistics</topic><topic>Approximation</topic><topic>Biological and medical sciences</topic><topic>Cardinality</topic><topic>Correlations</topic><topic>Covariance matrices</topic><topic>Crayfish</topic><topic>Datasets</topic><topic>Farm economics</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>General aspects. Techniques</topic><topic>Methods and techniques (sampling, tagging, trapping, modelling...)</topic><topic>Principal components analysis</topic><topic>Statistical variance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jorge F. C. L. Cadima</creatorcontrib><creatorcontrib>Jolliffe, Ian T.</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><jtitle>Journal of agricultural, biological, and environmental statistics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jorge F. C. L. Cadima</au><au>Jolliffe, Ian T.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Variable Selection and the Interpretation of Principal Subspaces</atitle><jtitle>Journal of agricultural, biological, and environmental statistics</jtitle><date>2001-03-01</date><risdate>2001</risdate><volume>6</volume><issue>1</issue><spage>62</spage><epage>79</epage><pages>62-79</pages><issn>1085-7117</issn><eissn>1537-2693</eissn><abstract>Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.</abstract><cop>Washington, DC</cop><pub>American Statistical Association and the International Biometric Society</pub><doi>10.1198/108571101300325256</doi><tpages>18</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1085-7117 |
ispartof | Journal of agricultural, biological, and environmental statistics, 2001-03, Vol.6 (1), p.62-79 |
issn | 1085-7117 1537-2693 |
language | eng |
recordid | cdi_crossref_primary_10_1198_108571101300325256 |
source | Jstor Complete Legacy; SpringerLink Journals; JSTOR Mathematics & Statistics |
subjects | Animal, plant and microbial ecology Applied statistics Approximation Biological and medical sciences Cardinality Correlations Covariance matrices Crayfish Datasets Farm economics Fundamental and applied biological sciences. Psychology General aspects. Techniques Methods and techniques (sampling, tagging, trapping, modelling...) Principal components analysis Statistical variance |
title | Variable Selection and the Interpretation of Principal Subspaces |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T17%3A58%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Variable%20Selection%20and%20the%20Interpretation%20of%20Principal%20Subspaces&rft.jtitle=Journal%20of%20agricultural,%20biological,%20and%20environmental%20statistics&rft.au=Jorge%20F.%20C.%20L.%20Cadima&rft.date=2001-03-01&rft.volume=6&rft.issue=1&rft.spage=62&rft.epage=79&rft.pages=62-79&rft.issn=1085-7117&rft.eissn=1537-2693&rft_id=info:doi/10.1198/108571101300325256&rft_dat=%3Cjstor_cross%3E1400354%3C/jstor_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_jstor_id=1400354&rfr_iscdi=true |