G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing
Graph Neural Networks (GNNs) are of great value in numerous applications and promote the development of cognitive intelligence, due to the capability of modeling non-euclidean data structures. However, the inherent irregularity makes GNNs memory-bound, and the hybrid computing paradigm of GNNs poses...
Gespeichert in:
Veröffentlicht in: | Journal of systems architecture 2022-08, Vol.129, p.102602, Article 102602 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Graph Neural Networks (GNNs) are of great value in numerous applications and promote the development of cognitive intelligence, due to the capability of modeling non-euclidean data structures. However, the inherent irregularity makes GNNs memory-bound, and the hybrid computing paradigm of GNNs poses significant challenges for efficient deployment on existing hardware architectures. Near-Memory Processing (NMP) is a promising solution for alleviating the memory wall problem. In this paper, we present G-NMP, a practical and efficient DIMM-based NMP solution for accelerating GNNs, which accelerates both sparse Aggregation and dense Combination computations on DIMM for the first time. We propose a novel G-NMP hardware architecture to exploit rank-level memory parallelism efficiently, and the G-ISA instructions to reduce host memory requests significantly. We conduct several data flow optimizations on the G-NMP to improve memory-compute overlap and to realize efficient matrix computation. Then we develop an adaptive data allocation strategy for diverse vector sizes to further exploit feature-level parallelism. We also propose a novel memory request scheduling method to achieve flexible and low-overhead DRAM ownership transition between host and G-NMP. Overall, G-NMP achieves consistent performance advantages across diverse GNN models and datasets, and offers 1.46× overall performance and 1.29× energy efficiency on average compared with the state-of-the-art work.
•G-NMP exploits rank-level parallelism and leverages off-the-shelf CPU and DRAM chips.•G-ISA instruction sets reduces memory requests and alleviates C/A bandwidth pressure.•Data flow optimization improves memory-compute overlap and reduces memory accesses.•Adaptive data allocation ensures memory parallelism for diverse vector sizes.•Propose a flexible and low-overhead memory request scheduling between CPU and G-NMP. |
---|---|
ISSN: | 1383-7621 1873-6165 |
DOI: | 10.1016/j.sysarc.2022.102602 |