Development of a parallel CUDA algorithm for solving 3D guiding center problems
In this study, we develop a novel compute unified device architecture (CUDA) algorithm, which we call C-ECM3, for solving a three-dimensional (3D) guiding center problem. The C-ECM3 is a parallel algorithm for the iterative-free backward semi-Lagrangian method with third-order temporal accuracy (ECM...
Gespeichert in:
Veröffentlicht in: | Computer physics communications 2022-07, Vol.276, p.108331, Article 108331 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this study, we develop a novel compute unified device architecture (CUDA) algorithm, which we call C-ECM3, for solving a three-dimensional (3D) guiding center problem. The C-ECM3 is a parallel algorithm for the iterative-free backward semi-Lagrangian method with third-order temporal accuracy (ECM3). One well known challenge in speeding up a CUDA program is to efficiently design kernel functions that can optimally use hierarchical memory classified according to access speed. To solve this challenge, the C-ECM3 is mainly devoted to making a decomposition strategy for solving the tremendous number of generated Cauchy problems. The decomposition strategy divides the 9×9 linear system for each Cauchy problem in the ECM3 into two 3×3 linear systems, more solverable parts. In addition, the strategy explicitly solves these small systems using Cramer's rule. It turns out that the proposed C-ECM3 enables us to design an array-free kernel function that efficiently uses hierarchical memory. In addition, the C-ECM3 significantly reduces the run-time for tracing trajectories of particles compared to other graphics processing unit (GPU) programs that use the usual Gaussian algorithm. The Kelvin-Helmholtz instability and a 3D guiding center problem are simulated to demonstrate the numerical evidence for the C-ECM3. With these numerical experiments, we verify that the proposed C-ECM3 significantly improves computational speed compared to other methods while maintaining the accuracy of the CPU (central processing unit) version of ECM3. The validity of the C-ECM3 is also confirmed by showing that it satisfies Shoucri's analysis for Kelvin-Helmholtz instability. |
---|---|
ISSN: | 0010-4655 1879-2944 |
DOI: | 10.1016/j.cpc.2022.108331 |