Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures

Stencil computations within a single core or multicores of an SMP node have been over-investigated. However, the demands on HPC’s higher performance and the rapidly increasing number of cores in modern processors pose new challenges for program developers. These cores are typically organized as seve...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2021-11, Vol.77 (11), p.13584-13600
Hauptverfasser:	Zhang, Kaifang, Su, Huayou, Dou, Yong
Format:	Artikel
Sprache:	eng
Schlagworte:	Compilers Computer Science Interpreters Nodes Optimization Processor Architectures Processors Programming Languages
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Stencil computations within a single core or multicores of an SMP node have been over-investigated. However, the demands on HPC’s higher performance and the rapidly increasing number of cores in modern processors pose new challenges for program developers. These cores are typically organized as several NUMA nodes, which are characterized by remote memory across nodes and local memory with uniform memory access within each node. In this paper, we conducted experiments of stencil computations on NUMA systems based on the two most typical processors, ARM and Intel Xeon E5. We leverage a hybrid programming approach by combining MPI and OpenMP to exploit the potential benefits among NUMA nodes and within a NUMA node. Optimizations of the two selected 3D stencil computations involve four-level parallelism: block decomposition for NUMA nodes and processes, thread-level parallelism within a NUMA node, and data-level parallelism within a thread based on SIMD extension. Experimental results show that we obtain a maximum speedup of 7.27 × compared to the pure OpenMP implementations on the ARM platform and 11.68 × on the Intel platform.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-021-03823-3