Variable Selection and the Interpretation of Principal Subspaces
Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables....
Gespeichert in:
Veröffentlicht in: | Journal of agricultural, biological, and environmental statistics biological, and environmental statistics, 2001-03, Vol.6 (1), p.62-79 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables. |
---|---|
ISSN: | 1085-7117 1537-2693 |
DOI: | 10.1198/108571101300325256 |