LocMoE: A Low-Overhead MoE for Large Language Model Training

The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-to-All c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-05
Hauptverfasser:	Li, Jing, Sun, Zhijie, He, Xuan, Zeng, Li, Lin, Yi, Li, Entong, Zheng, Binfan, Zhao, Rongqian, Chen, Xin
Format:	Artikel
Sprache:	eng
Schlagworte:	Communication Large language models Model accuracy Nodes Routers Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!