Parallel algorithm design and optimization of geodynamic numerical simulation application on the Tianhe new-generation high-performance computer

CitcomCu is a numerical simulation software for mantle convection in the field of geodynamics, which can simulate thermo-chemical convection in a three-dimensional domain. Due to the increasing demand for high-precision simulations and larger application scales, larger-scale computing systems are ne...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2024, Vol.80 (1), p.331-362
Hauptverfasser:	Yang, Jin, Yang, Wangdong, Qi, Ruixuan, Tsai, Qinyun, Lin, Shengle, Dong, Fengkun, Li, Kenli, Li, Keqin
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Compilers Computational efficiency Computer architecture Computer Science Computer simulation Convection Design optimization Efficiency Geodynamics Interpreters Iterative methods Iterative solution Neon Processor Architectures Programming Languages Simulation Sparse matrices
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	CitcomCu is a numerical simulation software for mantle convection in the field of geodynamics, which can simulate thermo-chemical convection in a three-dimensional domain. Due to the increasing demand for high-precision simulations and larger application scales, larger-scale computing systems are needed to solve this problem. However, the parallel efficiency of CitcomCu on large-scale heterogeneous parallel computing systems is difficult to improve, especially it cannot adapt to the current mainstream heterogeneous high-performance computing architecture with CPUs and accelerators. In this paper, we propose an geodynamics numerical simulation parallel computing framework using heterogeneous computing architecture based on the Tianhe new-generation high-performance computer. Firstly, the data partitioning mode of CitcomCu was optimized based on the large-scale heterogeneous computing architecture to reduce the overall communication overhead. Secondly, the iterative solution algorithm of CitcomCu was improved to speed up the solution process. Finally, the NEON instruction set based on SIMD is used for the sparse matrix operations in the solution process to improve parallel efficiency. Based on our parallel computing framework, the optimized CitcomCu was deployed and tested on the Tianhe new-generation high-performance computer. Experimental data showed that the performance of the optimized program was 3.3975 times higher than that of the unoptimized program on a single node. Compared with 50,000 computational cores, the parallel efficiency of the unoptimized program on one million computational cores was 36.75%, while the parallel efficiency of the optimized program was improved by 16.22% and reached 42.71%. In addition, the optimized program can be executed on 40 million computational cores, with a parallel efficiency of 36.54%.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-023-05469-9