MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed for pre-training mixed-modal, early-fusion language models. MoMa processes images and text in arbitrary sequences by dividing expert modules into modality-specific groups. These groups exclusively process design...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lin, Xi Victoria, Shrivastava, Akshat, Luo, Liang, Iyer, Srinivasan, Lewis, Mike, Ghosh, Gargi, Zettlemoyer, Luke, Aghajanyan, Armen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!