alpha$-TCVAE: On the relationship between Disentanglement and Diversity
While disentangled representations have shown promise in generative modeling and representation learning, their downstream usefulness remains debated. Recent studies re-defined disentanglement through a formal connection to symmetries, emphasizing the ability to reduce latent domains and consequentl...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While disentangled representations have shown promise in generative modeling
and representation learning, their downstream usefulness remains debated.
Recent studies re-defined disentanglement through a formal connection to
symmetries, emphasizing the ability to reduce latent domains and consequently
enhance generative capabilities. However, from an information theory viewpoint,
assigning a complex attribute to a specific latent variable may be infeasible,
limiting the applicability of disentangled representations to simple datasets.
In this work, we introduce $\alpha$-TCVAE, a variational autoencoder optimized
using a novel total correlation (TC) lower bound that maximizes disentanglement
and latent variables informativeness. The proposed TC bound is grounded in
information theory constructs, generalizes the $\beta$-VAE lower bound, and can
be reduced to a convex combination of the known variational information
bottleneck (VIB) and conditional entropy bottleneck (CEB) terms. Moreover, we
present quantitative analyses that support the idea that disentangled
representations lead to better generative capabilities and diversity.
Additionally, we perform downstream task experiments from both representation
and RL domains to assess our questions from a broader ML perspective. Our
results demonstrate that $\alpha$-TCVAE consistently learns more disentangled
representations than baselines and generates more diverse observations without
sacrificing visual fidelity. Notably, $\alpha$-TCVAE exhibits marked
improvements on MPI3D-Real, the most realistic disentangled dataset in our
study, confirming its ability to represent complex datasets when maximizing the
informativeness of individual variables. Finally, testing the proposed model
off-the-shelf on a state-of-the-art model-based RL agent, Director,
significantly shows $\alpha$-TCVAE downstream usefulness on the loconav Ant
Maze task. |
---|---|
DOI: | 10.48550/arxiv.2411.00588 |