Portability of Fortran's `do concurrent' on GPUs
There is a continuing interest in using standard language constructs for accelerated computing in order to avoid (sometimes vendor-specific) external APIs. For Fortran codes, the {\tt do concurrent} (DC) loop has been successfully demonstrated on the NVIDIA platform. However, support for DC on other...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | There is a continuing interest in using standard language constructs for
accelerated computing in order to avoid (sometimes vendor-specific) external
APIs. For Fortran codes, the {\tt do concurrent} (DC) loop has been
successfully demonstrated on the NVIDIA platform. However, support for DC on
other platforms has taken longer to implement. Recently, Intel has added DC GPU
offload support to its compiler, as has HPE for AMD GPUs. In this paper, we
explore the current portability of using DC across GPU vendors using the
in-production solar surface flux evolution code, HipFT. We discuss
implementation and compilation details, including when/where using directive
APIs for data movement is needed/desired compared to using a unified memory
system. The performance achieved on both data center and consumer platforms is
shown. |
---|---|
DOI: | 10.48550/arxiv.2408.07843 |