DISCO: Distributed Inference with Sparse Communications
Deep neural networks (DNNs) have great potential to solve many real-world problems, but they usually require an extensive amount of computation and memory. It is of great difficulty to deploy a large DNN model to a single resource-limited device with small memory capacity. Distributed computing is a...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep neural networks (DNNs) have great potential to solve many real-world
problems, but they usually require an extensive amount of computation and
memory. It is of great difficulty to deploy a large DNN model to a single
resource-limited device with small memory capacity. Distributed computing is a
common approach to reduce single-node memory consumption and to accelerate the
inference of DNN models. In this paper, we explore the "within-layer model
parallelism", which distributes the inference of each layer into multiple
nodes. In this way, the memory requirement can be distributed to many nodes,
making it possible to use several edge devices to infer a large DNN model. Due
to the dependency within each layer, data communications between nodes during
this parallel inference can be a bottleneck when the communication bandwidth is
limited. We propose a framework to train DNN models for Distributed Inference
with Sparse Communications (DISCO). We convert the problem of selecting which
subset of data to transmit between nodes into a model optimization problem, and
derive models with both computation and communication reduction when each layer
is inferred on multiple nodes. We show the benefit of the DISCO framework on a
variety of CV tasks such as image classification, object detection, semantic
segmentation, and image super resolution. The corresponding models include
important DNN building blocks such as convolutions and transformers. For
example, each layer of a ResNet-50 model can be distributively inferred across
two nodes with five times less data communications, almost half overall
computations and half memory requirement for a single node, and achieve
comparable accuracy to the original ResNet-50 model. This also results in 4.7
times overall inference speedup. |
---|---|
DOI: | 10.48550/arxiv.2302.11180 |