Sparse canonical correlation analysis from a predictive point of view

Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each dataset. However, in high‐dimensional settings where the number of variables exceeds the sample size or when the variables...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Biometrical journal 2015-09, Vol.57 (5), p.834-851
Hauptverfasser:	Wilms, Ines, Croux, Christophe
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Biometry - methods Canonical correlation analysis Correlation analysis Datasets Genomic data Genomics Lasso Penalized regression Regression Analysis Sparsity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each dataset. However, in high‐dimensional settings where the number of variables exceeds the sample size or when the variables are highly correlated, traditional CCA is no longer appropriate. This paper proposes a method for sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each dataset, thereby increasing the interpretability of the canonical variates. We consider the CCA problem from a predictive point of view and recast it into a regression framework. By combining an alternating regression approach together with a lasso penalty, we induce sparsity in the canonical vectors. We compare the performance with other sparse CCA techniques in different simulation settings and illustrate its usefulness on a genomic dataset.
ISSN:	0323-3847 1521-4036
DOI:	10.1002/bimj.201400226