Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

As high performance computing moves towards the exascale computing regime, applications are required to expose increasingly fine grain parallelism to efficiently use next generation supercomputers. Intended as a solution to the programming challenges associated with these architectures, High Perform...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of scientific computing 2019-08, Vol.80 (2), p.878-902
Hauptverfasser:	Bremer, Maximilian, Kazhyken, Kazbek, Kaiser, Hartmut, Michoski, Craig, Dawson, Clint
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Approximation C plus plus Computation Computational Mathematics and Numerical Analysis Finite element method Galerkin method Mathematical and Computational Engineering Mathematical and Computational Physics Mathematics Mathematics and Statistics Parallel processing Run time (computers) Shallow water equations Simulation Software Synchronism Theoretical Tidal waves
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	As high performance computing moves towards the exascale computing regime, applications are required to expose increasingly fine grain parallelism to efficiently use next generation supercomputers. Intended as a solution to the programming challenges associated with these architectures, High Performance ParalleX (HPX) is a task-based C++ runtime, which emphasizes the use of lightweight threads and algorithm-dependent synchronization to maximize parallelism exposed by the application to the machine. The aim of this work is to explore the performance benefits of an HPX parallelization versus a MPI parallelization for the discontinuous Galerkin finite element method for the two-dimensional shallow water equations. We present strong and weak scaling results comparing the performance of HPX versus a MPI parallelization strategy on Knights Landing architectures. Our results indicate that for average task sizes of 3.6 m s , HPX’s runtime overhead is offset by more efficient execution of the application. Furthermore, we demonstrate that running with sufficiently large task granularity, HPX is able to outperform the MPI parallelization by a factor of approximately 1.2 for up to 128 nodes.
ISSN:	0885-7474 1573-7691
DOI:	10.1007/s10915-019-00960-z