Efficient distributed algorithms for Convolutional Neural Networks

Several efficient distributed algorithms have been developed for matrix-matrix multiplication: the 3D algorithm, the 2D SUMMA algorithm, and the 2.5D algorithm. Each of these algorithms was independently conceived and they trade-off memory needed per node and the inter-node data communication volume...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2021-05
Hauptverfasser:	Li, Rui, Xu, Yufan, Sukumaran-Rajam, Aravind, Rountev, Atanas, Sadayappan, P
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial neural networks Computer Science - Distributed, Parallel, and Cluster Computing Data communication Distributed memory Multiplication Neural networks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Several efficient distributed algorithms have been developed for matrix-matrix multiplication: the 3D algorithm, the 2D SUMMA algorithm, and the 2.5D algorithm. Each of these algorithms was independently conceived and they trade-off memory needed per node and the inter-node data communication volume. The convolutional neural network (CNN) computation may be viewed as a generalization of matrix-multiplication combined with neighborhood stencil computations. We develop communication-efficient distributed-memory algorithms for CNNs that are analogous to the 2D/2.5D/3D algorithms for matrix-matrix multiplication.
ISSN:	2331-8422
DOI:	10.48550/arxiv.2105.13480