OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deployin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-05
Hauptverfasser:	Jiang, Youhe, Fu, Fangcheng, Miao, Xupeng, Nie, Xiaonan, Cui, Bin
Format:	Artikel
Sprache:	eng
Schlagworte:	Deep learning Parallel processing Scale models Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!