GPU-accelerated 3-D Finite Volume Particle Method

•A GPU-accelerated FVPM solver for simulating free surface flows is presented.•Roofline performance model is used to rationalize optimization strategies.•Space-filling curve and octree data structure allow efficient neighbor search.•Computing interaction vectors is the most time consuming task.•Comp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & fluids 2018-07, Vol.171, p.79-93
Hauptverfasser: Alimirzazadeh, Siamak, Jahanbakhsh, Ebrahim, Maertens, Audrey, Leguizamón, Sebastián, Avellan, François
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A GPU-accelerated FVPM solver for simulating free surface flows is presented.•Roofline performance model is used to rationalize optimization strategies.•Space-filling curve and octree data structure allow efficient neighbor search.•Computing interaction vectors is the most time consuming task.•Computation on a NVIDIA® TeslaTM P100 is 6 times faster than on a 28-core CPU node. In previous works (Jahanbakhsh et al., CMAME 298 (2016): 80–107, Jahanbakhsh et al., CMAME 317 (2017): 102–127; see [1] and [2]), the authors introduced SPHEROS, a 3-D particle-based solver based on the Finite Volume Particle Method (FVPM) featuring a spherical top-hat kernel. In the present research, the authors present algorithms and optimization procedures that allowed to significantly accelerate computations by taking advantage of the computational power of Graphics Processing Units (GPUs). The new accelerated solver, GPU-SPHEROS, has been developed in CUDA and runs entirely on GPU, are presented. All the parallel algorithms and data structures have been designed specifically for the GPU many-core architecture. A roofline model has been utilized to assess the performance of the kernels and apply appropriate optimization strategies. In particular, the neighbor search algorithm, accounting for almost a third of the overall compute time, features an efficient Space-Filling Curve (SFC) as well as an optimized octree construction procedure. The memory-bound interaction vector computation, accounting for almost two thirds of the overall compute time, features fixed-size memory pre-allocation and an efficient data ordering to reduce memory transactions and cost of dynamic memory operations i.e. allocation and deallocation. As a case study, the numerical simulation results of water jet deviation by the rotating buckets in a Pelton turbine is presented and compared to available experimental data. For that case, a speedup by a factor of almost six times has been achieved on a single NVIDIA® Tesla™ P100-SXM2-16 GB GPU with GP100 Pascal architecture compared to a dual CPU node equipped with two Broadwell Intel® Xeon® E5-2690 v4 CPUs with 28 total physical cores.
ISSN:0045-7930
1879-0747
DOI:10.1016/j.compfluid.2018.05.030