Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications focused on exploring distributing scheduling optimizations for Deep Learning (DL) workloads to obtain the best performance regarding latency and power efficiency. Our cluster was modular through...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose a distributed system based on lowpower embedded FPGAs designed for
edge computing applications focused on exploring distributing scheduling
optimizations for Deep Learning (DL) workloads to obtain the best performance
regarding latency and power efficiency. Our cluster was modular throughout the
experiment, and we have implementations that consist of up to 12 Zynq-7020
chip-based boards as well as 5 UltraScale+ MPSoC FPGA boards connected through
an ethernet switch, and the cluster will evaluate configurable Deep Learning
Accelerator (DLA) Versatile Tensor Accelerator (VTA). This adaptable
distributed architecture is distinguished by its capacity to evaluate and
manage neural network workloads in numerous configurations which enables users
to conduct multiple experiments tailored to their specific application needs.
The proposed system can simultaneously execute diverse Neural Network (NN)
models, arrange the computation graph in a pipeline structure, and manually
allocate greater resources to the most computationally intensive layers of the
NN graph. |
---|---|
DOI: | 10.48550/arxiv.2305.18332 |