High-performance multi-GPU solver for describing nonlinear acoustic waves in homogeneous thermoviscous media

•A multi-GPU 3-d solver for modeling ultrasound in thermoviscous media is presented.•The proposed algorithm is based on WENO-Z and third-order Runge–Kutta schemes.•A new multi-GPU communication scheme for the Runge–Kutta scheme is developed.•The optimization process used in developing a single- and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & fluids 2018-09, Vol.173, p.195-205
Hauptverfasser: Diaz, Manuel A., Solovchuk, Maxim A., Sheu, Tony W.H.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A multi-GPU 3-d solver for modeling ultrasound in thermoviscous media is presented.•The proposed algorithm is based on WENO-Z and third-order Runge–Kutta schemes.•A new multi-GPU communication scheme for the Runge–Kutta scheme is developed.•The optimization process used in developing a single- and a multi-GPU solver is detailed.•Simulations using single and multiple GPUs were performed to illustrate the method. A double-precision numerical solver to describe the propagation of high-intensity ultrasound fluctuations using a novel finite-amplitude compressible acoustic model working in multiple processing units (GPUs) is presented. The present solver is based on a conservative hyperbolic formulation derived from a variational analysis of the compressible Navier–Stokes equations and is implemented using an explicit high-order finite difference strategy. In this work, a WENO–Z reconstruction scheme along with a high-order finite-difference stencil are used to approximate the contributions of convective and diffusive spatial operators, respectively. The spatial operators are then associated to a low–storage Runge–Kutta scheme to integrate the system explicitly in time. The present multi-GPU implementation aims to make the best use of every single GPU and gain optimal performance of the algorithm on the per-node basis. To assess the performance of the present solver, a typical mini-server computer with 4 Tesla K80 dual GPU accelerators is used. The results show that the present formulation scales linearly for large domain problems. Moreover, when compared to an OpenMP implementation running with an i7 processor of 4.2 GHz, this is outperformed by our MPI-GPU implementation by a factor of 99. In this work, the present multi-GPU solver is illustrated with a three-dimensional simulation of a highly-intense focused ultrasound propagation.
ISSN:0045-7930
1879-0747
DOI:10.1016/j.compfluid.2018.03.008