Performance of a three-dimensional unstructured mesh compressible flow solver on NVIDIA Fermi-class graphics processing unit hardware

SUMMARYWe describe the performance of Chicoma, a 3D unstructured mesh compressible flow solver, on graphics processing unit (GPU) hardware. The approach used to deploy the solver on GPU architectures derives from the threaded multicore execution model used in Chicoma, and attempts to improve memory...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal for numerical methods in fluids 2013-05, Vol.72 (2), p.259-268
1. Verfasser:	Waltz, Jacob
Format:	Artikel
Sprache:	eng
Schlagworte:	Compressible flow Eulerian explicit finite element Finite element method Hardware Mathematical models parallelization partial differential equations Programming Solvers Threaded Three dimensional
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	SUMMARYWe describe the performance of Chicoma, a 3D unstructured mesh compressible flow solver, on graphics processing unit (GPU) hardware. The approach used to deploy the solver on GPU architectures derives from the threaded multicore execution model used in Chicoma, and attempts to improve memory performance via the application of graph theory techniques. The result is a scheme that can be deployed on the GPU with high‐level programming constructs, for example, compiler directives, rather than low‐level programming extensions. With an NVIDIA Fermi‐class GPU (NVIDIA Corp., Sta. Clara, CA, USA) and double precision floating point arithmetic, we observe performance gains of 4–5 × on problem sizes of 106– 107 tetrahedra. We also compare GPU performance to threaded multicore performance with OpenMP and demonstrate hybrid multicore‐GPU calculations with adaptive mesh refinement. Published 2012. This article is a US Government work and is in the public domain in the USA. We describe the performance of a 3D unstructured mesh compressible flow solver on graphics processing unit (GPU) hardware. Using a graph‐theoretic optimization approach, we achieve speed‐ups of 4 − 5 × on meshes up to 107 tetrahedra. Hybrid GPU‐OpenMP calculations with adaptive mesh refinement are also demonstrated.
ISSN:	0271-2091 1097-0363
DOI:	10.1002/fld.3744