CoCA: Cooperative Component Analysis
We propose Cooperative Component Analysis (CoCA), a new method for unsupervised multi-view analysis: it identifies the component that simultaneously captures significant within-view variance and exhibits strong cross-view correlation. The challenge of integrating multi-view data is particularly impo...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose Cooperative Component Analysis (CoCA), a new method for
unsupervised multi-view analysis: it identifies the component that
simultaneously captures significant within-view variance and exhibits strong
cross-view correlation. The challenge of integrating multi-view data is
particularly important in biology and medicine, where various types of "-omic"
data, ranging from genomics to proteomics, are measured on the same set of
samples. The goal is to uncover important, shared signals that represent
underlying biological mechanisms. CoCA combines an approximation error loss to
preserve information within data views and an "agreement penalty" to encourage
alignment across data views. By balancing the trade-off between these two key
components in the objective, CoCA has the property of interpolating between the
commonly-used principal component analysis (PCA) and canonical correlation
analysis (CCA) as special cases at the two ends of the solution path. CoCA
chooses the degree of agreement in a data-adaptive manner, using a validation
set or cross-validation to estimate test error. Furthermore, we propose a
sparse variant of CoCA that incorporates the Lasso penalty to yield feature
sparsity, facilitating the identification of key features driving the observed
patterns. We demonstrate the effectiveness of CoCA on simulated data and two
real multiomics studies of COVID-19 and ductal carcinoma in situ of breast. In
both real data applications, CoCA successfully integrates multiomics data,
extracting components that are not only consistently present across different
data views but also more informative and predictive of disease progression.
CoCA offers a powerful framework for discovering important shared signals in
multi-view data, with the potential to uncover novel insights in an
increasingly multi-view data world. |
---|---|
DOI: | 10.48550/arxiv.2407.16870 |