Early-Exit meets Model-Distributed Inference at Edge Networks
Distributed inference techniques can be broadly classified into data-distributed and model-distributed schemes. In data-distributed inference (DDI), each worker carries the entire deep neural network (DNN) model but processes only a subset of the data. However, feeding the data to workers results in...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Distributed inference techniques can be broadly classified into
data-distributed and model-distributed schemes. In data-distributed inference
(DDI), each worker carries the entire deep neural network (DNN) model but
processes only a subset of the data. However, feeding the data to workers
results in high communication costs, especially when the data is large. An
emerging paradigm is model-distributed inference (MDI), where each worker
carries only a subset of DNN layers. In MDI, a source device that has data
processes a few layers of DNN and sends the output to a neighboring device,
i.e., offloads the rest of the layers. This process ends when all layers are
processed in a distributed manner. In this paper, we investigate the design and
development of MDI with early-exit, which advocates that there is no need to
process all the layers of a model for some data to reach the desired accuracy,
i.e., we can exit the model without processing all the layers if target
accuracy is reached. We design a framework MDI-Exit that adaptively determines
early-exit and offloading policies as well as data admission at the source.
Experimental results on a real-life testbed of NVIDIA Nano edge devices show
that MDI-Exit processes more data when accuracy is fixed and results in higher
accuracy for the fixed data rate. |
---|---|
DOI: | 10.48550/arxiv.2408.05247 |