Overlapping Communication With Computation in Parameter Server for Scalable DL Training

Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2021-09, Vol.32 (9), p.2144-2159
Hauptverfasser: Wang, Shaoqi, Pi, Aidi, Zhou, Xiaobo, Wang, Jun, Xu, Cheng-Zhong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!