Bayesian mixture variational autoencoders for multi-modal learning

This paper provides an in-depth analysis on how to effectively acquire and generalize cross-modal knowledge for multi-modal learning. Mixture-of-Expert (MoE) and Product-of-Expert (PoE) are two popular directions in generalizing multi-modal information. Existing works based on MoE or PoE have shown...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Machine learning 2022-12, Vol.111 (12), p.4329-4357
Hauptverfasser: Liao, Keng-Te, Huang, Bo-Wei, Yang, Chih-Chun, Lin, Shou-De
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper provides an in-depth analysis on how to effectively acquire and generalize cross-modal knowledge for multi-modal learning. Mixture-of-Expert (MoE) and Product-of-Expert (PoE) are two popular directions in generalizing multi-modal information. Existing works based on MoE or PoE have shown notable improvement on data generation, while new challenges such as high training cost, overconfident experts, and encoding modal-specific features also emerge. In this work, we propose Bayesian mixture variational autoencoder (BMVAE) which learns to select or combine experts via Bayesian inference. We show that the proposed idea can naturally encourage models to learn modal-specific knowledge and avoid overconfident experts. Also, we show that the idea is compatible with both MoE and PoE frameworks. When being a MoE model, BMVAE can be optimized by a tight lower bound and is efficient to train. The PoE BMVAE has the same advantages and a theoretical connection to existing works. In the experiments, we show that BMVAE achieves state-of-the-art performance.
ISSN:0885-6125
1573-0565
DOI:10.1007/s10994-022-06272-y