Communication Optimization Algorithms for Distributed Deep Learning Systems: A Survey
Deep learning's widespread adoption in various fields has made distributed training across multiple computing nodes essential. However, frequent communication between nodes can significantly slow down training speed, creating a bottleneck in distributed training. To address this issue, research...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2023-12, Vol.34 (12), p.3294-3308 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep learning's widespread adoption in various fields has made distributed training across multiple computing nodes essential. However, frequent communication between nodes can significantly slow down training speed, creating a bottleneck in distributed training. To address this issue, researchers are focusing on communication optimization algorithms for distributed deep learning systems. In this paper, we propose a standard that systematically classifies all communication optimization algorithms based on mathematical modeling, which is not achieved by existing surveys in the field. We categorize existing works into four categories based on the optimization strategies of communication: communication masking, communication compression, communication frequency reduction, and hybrid optimization. Finally, we discuss potential future challenges and research directions in the field of communication optimization algorithms for distributed deep learning systems. |
---|---|
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2023.3323282 |