A Lightweight, Compiler-Assisted Register File Cache for GPGPU
Modern GPUs require an enormous register file (RF) to store the context of thousands of active threads. It consumes considerable energy and contains multiple large banks to provide enough throughput. Thus, a RF caching mechanism can significantly improve the performance and energy consumption of the...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Modern GPUs require an enormous register file (RF) to store the context of
thousands of active threads. It consumes considerable energy and contains
multiple large banks to provide enough throughput. Thus, a RF caching mechanism
can significantly improve the performance and energy consumption of the GPUs by
avoiding reads from the large banks that consume significant energy and may
cause port conflicts.
This paper introduces an energy-efficient RF caching mechanism called Malekeh
that repurposes an existing component in GPUs' RF to operate as a cache in
addition to its original functionality. In this way, Malekeh minimizes the
overhead of adding a RF cache to GPUs. Besides, Malekeh leverages an issue
scheduling policy that utilizes the reuse distance of the values in the RF
cache and is controlled by a dynamic algorithm. The goal is to adapt the issue
policy to the runtime program characteristics to maximize the GPU's performance
and the hit ratio of the RF cache. The reuse distance is approximated by the
compiler using profiling and is used at run time by the proposed caching
scheme. We show that Malekeh reduces the number of reads to the RF banks by
46.4% and the dynamic energy of the RF by 28.3%. Besides, it improves
performance by 6.1% while adding only 2KB of extra storage per core to the
baseline RF of 256KB, which represents a negligible overhead of 0.78%. |
---|---|
DOI: | 10.48550/arxiv.2310.17501 |