Self-Optimizing and Self-Programming Computing Systems: A Combined Compiler, Complex Networks, and Machine Learning Approach

There exists an urgent need for determining the right amount and type of specialization while making a heterogeneous system as programmable and flexible as possible. Therefore, in this paper, we pioneer a self-optimizing and self-programming computing system (SOSPCS) design framework that achieves b...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on very large scale integration (VLSI) systems 2019-06, Vol.27 (6), p.1416-1427
Hauptverfasser:	Xiao, Yao, Nazarian, Shahin, Bogdan, Paul
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerators Algorithms Artificial neural networks Computation Distributed Q-learning domain-specific system-on-chip (DSSoC) Graphics processing units Hardware heterogeneous systems Machine learning Motion perception Multicore processing Multiplication network-on-chip (NoC) Neural networks neural networks (NNs) Optimization Processor scheduling Programming Runtime self-optimizing self-programming software-defined hardware (SDH) System on chip Target detection Task analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	There exists an urgent need for determining the right amount and type of specialization while making a heterogeneous system as programmable and flexible as possible. Therefore, in this paper, we pioneer a self-optimizing and self-programming computing system (SOSPCS) design framework that achieves both programmability and flexibility and exploits computing heterogeneity [e.g., CPUs, GPUs, and hardware accelerators (HWAs)]. First, at compile time, we form a task pool consisting of hybrid tasks with different processing element (PE) affinities according to target applications. Tasks preferred to be executed on GPUs or accelerators are detected from target applications by neural networks. Tasks suitable to run on CPUs are formed by community detection to minimize data movement overhead. Next, a distributed reinforcement learning-based approach is used at runtime to allow agents to map the tasks onto the network-on-chip-based heterogeneous PEs by learning an optimal policy based on Q values in the environment. We have conducted experiments on a heterogeneous platform consisting of CPUs, GPUs, and HWAs with deep learning algorithms such as matrix multiplication, ReLU, and sigmoid functions. We concluded that SOSPCS provides performance improvement up to 4.12\times and energy reduction up to 3.24\times compared to the state-of-the-art approaches.
ISSN:	1063-8210 1557-9999
DOI:	10.1109/TVLSI.2019.2897650