A hybrid MPI–OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence

► A two-level hybrid OpenMP/MPI parallelization scheme is presented for pseudospectral computations of fluid turbulence. ► The hybrid scheme leads naturally to a new picture for the domain decomposition of the grids. ► The hybrid scheme scales well up to ∼20,000 compute cores with a maximum parallel...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Parallel computing 2011-06, Vol.37 (6), p.316-326
Hauptverfasser: Mininni, Pablo D., Rosenberg, Duane, Reddy, Raghu, Pouquet, Annick
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:► A two-level hybrid OpenMP/MPI parallelization scheme is presented for pseudospectral computations of fluid turbulence. ► The hybrid scheme leads naturally to a new picture for the domain decomposition of the grids. ► The hybrid scheme scales well up to ∼20,000 compute cores with a maximum parallel efficiency of 89%. ► The method allows us to reduce the number of MPI tasks, and increase network bandwidth. ► The new scheme is competitive with the pure MPI-based method, but does not provide a clear “win” in our tests. A hybrid scheme that utilizes MPI for distributed memory parallelism and OpenMP for shared memory parallelism is presented. The work is motivated by the desire to achieve exceptionally high Reynolds numbers in pseudospectral computations of fluid turbulence on emerging petascale, high core-count, massively parallel processing systems. The hybrid implementation derives from and augments a well-tested scalable MPI-parallelized pseudospectral code. The hybrid paradigm leads to a new picture for the domain decomposition of the pseudospectral grids, which is helpful in understanding, among other things, the 3D transpose of the global data that is necessary for the parallel fast Fourier transforms that are the central component of the numerical discretizations. Details of the hybrid implementation are provided, and performance tests illustrate the utility of the method. It is shown that the hybrid scheme achieves good scalability up to ∼20,000 compute cores with a maximum efficiency of 89%, and a mean of 79%. Data are presented that help guide the choice of the optimal number of MPI tasks and OpenMP threads in order to maximize code performance on two different platforms.
ISSN:0167-8191
1872-7336
DOI:10.1016/j.parco.2011.05.004