MARS: Memory Aware Reordered Source

Memory bandwidth is critical in today's high performance computing systems. The bandwidth is particularly paramount for GPU workloads such as 3D Gaming, Imaging and Perceptual Computing, GPGPU due to their data-intensive nature. As the number of threads and data streams in the GPUs increases wi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Bhati, Ishwar, Dhawan, Udit, Gaur, Jayesh, Subramoney, Sreenivas, Wang, Hong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Memory bandwidth is critical in today's high performance computing systems. The bandwidth is particularly paramount for GPU workloads such as 3D Gaming, Imaging and Perceptual Computing, GPGPU due to their data-intensive nature. As the number of threads and data streams in the GPUs increases with each generation, along with a high available memory bandwidth, memory efficiency is also crucial in order to achieve desired performance. In presence of multiple concurrent data streams, the inherent locality in a single data stream is often lost as these streams are interleaved while moving through multiple levels of memory system. In DRAM based main memory, the poor request locality reduces row-buffer reuse resulting in underutilized and inefficient memory bandwidth. In this paper we propose Memory-Aware Reordered Source (\textit{MARS}) architecture to address memory inefficiency arising from highly interleaved data streams. The key idea of \textit{MARS} is that with a sufficiently large lookahead before the main memory, data streams can be reordered based on their row-buffer address to regain the lost locality and improve memory efficiency. We show that \textit{MARS} improves achieved memory bandwidth by 11\% for a set of synthetic microbenchmarks. Moreover, MARS does so without any specific knowledge of the memory configuration.
DOI:10.48550/arxiv.1808.03518