GPU accelerated Staggered Update Procedure (SUP)

The advancement in programmable capability of graphics hardware has paved new opportunities in the domain of high performance computing (HPC). The computational fluid dynamics (CFD) community, being a significant user of HPC, has started exploiting the inherent data parallelism in the numerical solv...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers & fluids 2024-10, Vol.283, p.106408, Article 106408
Hauptverfasser:	Subudhi, Shubhashree, Khillare, Amol, Munikrishna, N., Balakrishnan, N.
Format:	Artikel
Sprache:	eng
Schlagworte:	Graphics processing unit (GPU) Higher order finite volume method Meshless solvers OpenACC Speedup Staggered Update Procedure (SUP)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The advancement in programmable capability of graphics hardware has paved new opportunities in the domain of high performance computing (HPC). The computational fluid dynamics (CFD) community, being a significant user of HPC, has started exploiting the inherent data parallelism in the numerical solvers to be able to make efficient use of these many-core, high throughput accelerator based processors. In the present work, we examine the process of accelerating our CPU based Staggered Update Procedure (SUP) solver, i.e., a higher order accurate cell-centred finite volume solver by off-loading the computationally most expensive region of the code pertaining to the explicit residual computation. We have adopted OpenACC, a directive based programming model to expose parallelism in the code. The framework evolved for GPU porting in the context of SUP is also of value to those intending to port their CFD solvers based on classical finite volume methodology. The performance analysis is conducted using scalar convection–diffusion equations in both two- and three-dimensions. The findings demonstrate a speedup factor of 9 (in case of 2D) and 28 (in case of 3D) when considering the explicit residual alone, achieved with a single NVIDIA Tesla V100 GPU card. In addition, we could establish superior algorithmic scalability by the way of recovering near perfect serial performance, on the heterogeneous CPU+GPU architecture. Further, overall code acceleration can be achieved by porting other parts of the solver on GPU. •Acceleration of SUP, a higher order finite volume solver, by off-loading the explicit residual calculation onto GPU.•OpenACC compiler directives implementation to expose parallelism in the solver.•Speedup factors of 9 (for 2D) and 28 (for 3D) are obtained on a single NVIDIA Tesla V100 GPU card.
ISSN:	0045-7930
DOI:	10.1016/j.compfluid.2024.106408