COPA: Constrained PARAFAC2 for Sparse & Large Datasets
PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is modeling treatments across a set of patients with the varying number of medical encounters over time. Despite recent improvements on unconstrained PARAFAC...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | PARAFAC2 has demonstrated success in modeling irregular tensors, where the
tensor dimensions vary across one of the modes. An example scenario is modeling
treatments across a set of patients with the varying number of medical
encounters over time. Despite recent improvements on unconstrained PARAFAC2,
its model factors are usually dense and sensitive to noise which limits their
interpretability. As a result, the following open challenges remain: a) various
modeling constraints, such as temporal smoothness, sparsity and non-negativity,
are needed to be imposed for interpretable temporal modeling and b) a scalable
approach is required to support those constraints efficiently for large
datasets. To tackle these challenges, we propose a {\it CO}nstrained {\it
PA}RAFAC2 (COPA) method, which carefully incorporates optimization constraints
such as temporal smoothness, sparsity, and non-negativity in the resulting
factors. To efficiently support all those constraints, COPA adopts a hybrid
optimization framework using alternating optimization and alternating direction
method of multiplier (AO-ADMM). As evaluated on large electronic health record
(EHR) datasets with hundreds of thousands of patients, COPA achieves
significant speedups (up to 36 times faster) over prior PARAFAC2 approaches
that only attempt to handle a subset of the constraints that COPA enables.
Overall, our method outperforms all the baselines attempting to handle a subset
of the constraints in terms of speed, while achieving the same level of
accuracy. Through a case study on temporal phenotyping of medically complex
children, we demonstrate how the constraints imposed by COPA reveal concise
phenotypes and meaningful temporal profiles of patients. The clinical
interpretation of both the phenotypes and the temporal profiles was confirmed
by a medical expert. |
---|---|
DOI: | 10.48550/arxiv.1803.04572 |