G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing

Graph Neural Networks (GNNs) are of great value in numerous applications and promote the development of cognitive intelligence, due to the capability of modeling non-euclidean data structures. However, the inherent irregularity makes GNNs memory-bound, and the hybrid computing paradigm of GNNs poses...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of systems architecture 2022-08, Vol.129, p.102602, Article 102602
Hauptverfasser:	Tian, Teng, Wang, Xiaotian, Zhao, Letian, Wu, Wei, Zhang, Xuecang, Lu, Fangmin, Wang, Tianqi, Jin, Xi
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithm-hardware co-design Dual in-line memory module Graph neural network Near-memory processing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Graph Neural Networks (GNNs) are of great value in numerous applications and promote the development of cognitive intelligence, due to the capability of modeling non-euclidean data structures. However, the inherent irregularity makes GNNs memory-bound, and the hybrid computing paradigm of GNNs poses significant challenges for efficient deployment on existing hardware architectures. Near-Memory Processing (NMP) is a promising solution for alleviating the memory wall problem. In this paper, we present G-NMP, a practical and efficient DIMM-based NMP solution for accelerating GNNs, which accelerates both sparse Aggregation and dense Combination computations on DIMM for the first time. We propose a novel G-NMP hardware architecture to exploit rank-level memory parallelism efficiently, and the G-ISA instructions to reduce host memory requests significantly. We conduct several data flow optimizations on the G-NMP to improve memory-compute overlap and to realize efficient matrix computation. Then we develop an adaptive data allocation strategy for diverse vector sizes to further exploit feature-level parallelism. We also propose a novel memory request scheduling method to achieve flexible and low-overhead DRAM ownership transition between host and G-NMP. Overall, G-NMP achieves consistent performance advantages across diverse GNN models and datasets, and offers 1.46× overall performance and 1.29× energy efficiency on average compared with the state-of-the-art work. •G-NMP exploits rank-level parallelism and leverages off-the-shelf CPU and DRAM chips.•G-ISA instruction sets reduces memory requests and alleviates C/A bandwidth pressure.•Data flow optimization improves memory-compute overlap and reduces memory accesses.•Adaptive data allocation ensures memory parallelism for diverse vector sizes.•Propose a flexible and low-overhead memory request scheduling between CPU and G-NMP.
ISSN:	1383-7621 1873-6165
DOI:	10.1016/j.sysarc.2022.102602