Accelerating atmospheric physics parameterizations using graphics processing units
As part of a project aimed at exploring the use of next-generation high-performance computing technologies for numerical weather prediction, we have ported two physics modules from the Common Community Physics Package (CCPP) to Graphics Processing Unit (GPU) and obtained accelerations of up to 10× r...
Gespeichert in:
Veröffentlicht in: | The international journal of high performance computing applications 2024-07, Vol.38 (4), p.282-296 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | As part of a project aimed at exploring the use of next-generation high-performance computing technologies for numerical weather prediction, we have ported two physics modules from the Common Community Physics Package (CCPP) to Graphics Processing Unit (GPU) and obtained accelerations of up to 10× relative to a comparable multi-core CPU. The physics parameterizations accelerated in this work are the aerosol-aware Thompson microphysics (TH) scheme and the Grell–Freitas (GF) cumulus convection scheme. Microphysics schemes are among the most time-consuming physics parameterizations, second to only radiative process schemes, and our results show better acceleration for the TH scheme than the GF scheme. Multi-GPU implementations of the schemes show acceptable weak scaling in a single node with 8 GPUs, and perfect weak scaling on multiple nodes using one GPU per node. The lack of inter-node communication for column physics parameterizations contributes to their scalability, however, physics parameterizations are run along with dynamics, so the overall multi-GPU performance is often governed by the latter. In the context of optimizing CCPP physics modules, our observations underscore that the extensive use of automatic arrays within inner subroutines hampers GPU performance due to serialized memory allocations. We have used the OpenACC directive programming language for this work because it allows for easy porting of large amounts of code and makes code maintenance more manageable compared to low-level languages like CUDA and OpenCL. |
---|---|
ISSN: | 1094-3420 1741-2846 |
DOI: | 10.1177/10943420241238711 |