A$^3$PIM: An Automated, Analytic and Accurate Processing-in-Memory Offloader
The performance gap between memory and processor has grown rapidly. Consequently, the energy and wall-clock time costs associated with moving data between the CPU and main memory predominate the overall computational cost. The Processing-in-Memory (PIM) paradigm emerges as a promising architecture t...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The performance gap between memory and processor has grown rapidly.
Consequently, the energy and wall-clock time costs associated with moving data
between the CPU and main memory predominate the overall computational cost. The
Processing-in-Memory (PIM) paradigm emerges as a promising architecture that
mitigates the need for extensive data movements by strategically positioning
computing units proximate to the memory. Despite the abundant efforts devoted
to building a robust and highly-available PIM system, identifying PIM-friendly
segments of applications poses significant challenges due to the lack of a
comprehensive tool to evaluate the intrinsic memory access pattern of the
segment.
To tackle this challenge, we propose A$^3$PIM: an Automated, Analytic and
Accurate Processing-in-Memory offloader. We systematically consider the
cross-segment data movement and the intrinsic memory access pattern of each
code segment via static code analyzer. We evaluate A$^3$PIM across a wide range
of real-world workloads including GAP and PrIM benchmarks and achieve an
average speedup of 2.63x and 4.45x (up to 7.14x and 10.64x) when compared to
CPU-only and PIM-only executions, respectively. |
---|---|
DOI: | 10.48550/arxiv.2402.18592 |