Deep Reinforcement Learning Acceleration for Real-Time Edge Computing Mixed Integer Programming Problems

In this work, we present the design and implementation of an ultra-low latency Deep Reinforcement Learning (DRL) FPGA based accelerator for addressing hard real-time Mixed Integer Programming problems. The accelerator exhibits ultra-low latency performance for both training and inference operations,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2022, Vol.10, p.18526-18543
Hauptverfasser:	Gerogiannis, Gerasimos, Birbas, Michael, Leftheriotis, Aimilios, Mylonas, Eleftherios, Tzanis, Nikolaos, Birbas, Alexios
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerator Algorithms Artificial neural networks Computation offloading Deep learning deep reinforcement learning Design modifications Edge computing Field programmable gate arrays FPGA High level synthesis Inference Inference algorithms Integer programming Machine learning Mixed integer mixed integer programming Mobile computing Parallel processing Real time Real-time systems Resource allocation Resource management Task analysis Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this work, we present the design and implementation of an ultra-low latency Deep Reinforcement Learning (DRL) FPGA based accelerator for addressing hard real-time Mixed Integer Programming problems. The accelerator exhibits ultra-low latency performance for both training and inference operations, enabled by training-inference parallelism, pipelined training, on-chip weights and replay memory, multi-level replication-based parallelism and DRL algorithmic modifications such as distribution of training over time. The design principles can be extended to support hardware acceleration for other relevant DRL algorithms (embedding the experience replay technique) with hard real time constraints. We evaluate the accuracy of the accelerator in a task offloading and resource allocation problem stemming from a Mobile Edge Computing (MEC/5G) scenario. The design has been implemented on a Xilinx Zynq Ultrascale+ MPSoC ZCU104 evaluation kit using High Level Synthesis. The accelerator achieves near optimal performance and exhibits a 10-fold decrease in training-inference execution latency when compared to a high-end CPU-based implementation.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2022.3147674