Accelerating atmospheric physics parameterizations using graphics processing units

As part of a project aimed at exploring the use of next-generation high-performance computing technologies for numerical weather prediction, we have ported two physics modules from the Common Community Physics Package (CCPP) to Graphics Processing Unit (GPU) and obtained accelerations of up to 10× r...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The international journal of high performance computing applications 2024-07, Vol.38 (4), p.282-296
Hauptverfasser: Abdi, Daniel S, Jankov, Isidora
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As part of a project aimed at exploring the use of next-generation high-performance computing technologies for numerical weather prediction, we have ported two physics modules from the Common Community Physics Package (CCPP) to Graphics Processing Unit (GPU) and obtained accelerations of up to 10× relative to a comparable multi-core CPU. The physics parameterizations accelerated in this work are the aerosol-aware Thompson microphysics (TH) scheme and the Grell–Freitas (GF) cumulus convection scheme. Microphysics schemes are among the most time-consuming physics parameterizations, second to only radiative process schemes, and our results show better acceleration for the TH scheme than the GF scheme. Multi-GPU implementations of the schemes show acceptable weak scaling in a single node with 8 GPUs, and perfect weak scaling on multiple nodes using one GPU per node. The lack of inter-node communication for column physics parameterizations contributes to their scalability, however, physics parameterizations are run along with dynamics, so the overall multi-GPU performance is often governed by the latter. In the context of optimizing CCPP physics modules, our observations underscore that the extensive use of automatic arrays within inner subroutines hampers GPU performance due to serialized memory allocations. We have used the OpenACC directive programming language for this work because it allows for easy porting of large amounts of code and makes code maintenance more manageable compared to low-level languages like CUDA and OpenCL.
ISSN:1094-3420
1741-2846
DOI:10.1177/10943420241238711