Overlapping Communication With Computation in Parameter Server for Scalable DL Training
Scalability of distributed deep learning (DL) training with parameter server (PS) architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of c...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2021-09, Vol.32 (9), p.2144-2159 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Schreiben Sie den ersten Kommentar!