Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores

Single-issue processor cores are very energy efficient but suffer from the von Neumann bottleneck, in that they must explicitly fetch and issue the loads/storse necessary to feed their ALU/FPU. Each instruction spent on moving data is a cycle not spent on computation, limiting ALU/FPU utilization to...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computers 2021-02, Vol.70 (2), p.212-227
Hauptverfasser:	Schuiki, Fabian, Zaruba, Florian, Hoefler, Torsten, Benini, Luca
Format:	Artikel
Sprache:	eng
Schlagworte:	Clusters Decoding Energy efficiency energy-aware systems Hardware Kernel Lightweight micro-architecture implementation considerations Microprocessors Multicore processing Parallel architectures Power consumption Registers RISC Semantics Utilization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Single-issue processor cores are very energy efficient but suffer from the von Neumann bottleneck, in that they must explicitly fetch and issue the loads/storse necessary to feed their ALU/FPU. Each instruction spent on moving data is a cycle not spent on computation, limiting ALU/FPU utilization to 33 percent on reductions. We propose "Stream Semantic Registers" to boost utilization and increase energy efficiency. SSR is a lightweight, non-invasive RISC-V ISA extension which implicitly encodes memory accesses as register reads/writes, eliminating a large number of loads/stores. We implement the proposed extension in the RTL of an existing multi-core cluster and synthesize the design for a modern 22 nm technology. Our extension provides a significant, 2x to 5x, architectural speedup across different kernels at a small 11 percent increase in core area. Sequential code runs 3x faster on a single core, and 3x fewer cores are needed in a cluster to achieve the same performance. The utilization increase to almost 100 percent in leads to a 2x energy efficiency improvement in a multi-core cluster. The extension reduces instruction fetches by up to 3.5x and instruction cache power consumption by up to 5.6x. Compilers can automatically map loop nests to SSRs, making the changes transparent to the programmer.
ISSN:	0018-9340 1557-9956
DOI:	10.1109/TC.2020.2987314