Principal component analysis for interval-valued observations

One feature of contemporary datasets is that instead of the single point value in the p‐dimensional space ℜp seen in classical data, the data may take interval values thus producing hypercubes in ℜp. This paper studies the vertices principal components methodology for interval‐valued data; and provi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Statistical analysis and data mining 2011-04, Vol.4 (2), p.229-246
Hauptverfasser:	Douzal-Chouakria, A., Billard, L., Diday, E.
Format:	Artikel
Sprache:	eng
Schlagworte:	correlations inertia Mathematics Statistics Statistics Theory vertex contributions vertices principal components
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	One feature of contemporary datasets is that instead of the single point value in the p‐dimensional space ℜp seen in classical data, the data may take interval values thus producing hypercubes in ℜp. This paper studies the vertices principal components methodology for interval‐valued data; and provides enhancements to allow for so‐called ‘trivial’ intervals, and generalized weight functions. It also introduces the concept of vertex contributions to the underlying principal components, a concept not possible for classical data, but one which provides a visualization method that further aids in the interpretation of the methodology. The method is illustrated in a dataset using measurements of facial characteristics obtained from a study of face recognition patterns for surveillance purposes. A comparison with analyses in which classical surrogates replace the intervals, shows how the symbolic analysis gives more informative conclusions. A second example illustrates how the method can be applied even when the number of parameters exceeds the number of observations, as well as how uncertainty data can be accommodated. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 229–246 2011
ISSN:	1932-1864 1932-1872
DOI:	10.1002/sam.10118