Optimization with the OpenACC-to-FPGA framework on the Arria 10 and Stratix 10 FPGAs
The reconfigurable computing paradigm with field programmable gate arrays (FPGAs) has received renewed interest in the high-performance computing field due to FPGAs’ unique combination of performance and energy efficiency. However, difficulties in programming and optimizing FPGAs have prevented them...
Gespeichert in:
Veröffentlicht in: | Parallel computing 2021-07, Vol.104-105, p.102784, Article 102784 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The reconfigurable computing paradigm with field programmable gate arrays (FPGAs) has received renewed interest in the high-performance computing field due to FPGAs’ unique combination of performance and energy efficiency. However, difficulties in programming and optimizing FPGAs have prevented them from being widely accepted as general-purpose computing devices. In accelerator-based heterogeneous computing, portability across diverse heterogeneous devices is also an important issue, but the unique architectural features in FPGAs make this difficult to achieve. To address these issues, a directive-based, high-level FPGA programming and optimization framework was previously developed. In this work, developed optimizations were combined holistically using the directive-based approach to show that each individual benchmark requires a unique set of optimizations to maximize performance. We perform this exploration on Intel Arria 10 and Stratix 10 FPGAs. We also explored the relationships between performance, resource usages, and compilation times, and investigated implications for performance portability. Finally, we present an initial evaluation of a real-world proxy application, LULESH.
•A categorical organization and summary of optimizations available in the OpenACC-to-FPGA framework is presented.•Developed optimizations are holistically evaluated on an array of benchmarks using Arria 10 and Stratix 10 FPGAs.•The relationships between FPGA resource usages, kernel frequencies, compilation times, and runtime performance are explored.•Implications for performance portability between the two evaluated FPGAs are investigated.•An initial evaluation of the LULESH 2.0 proxy application with the OpenACC-to-FPGA framework is performed. |
---|---|
ISSN: | 0167-8191 1872-7336 |
DOI: | 10.1016/j.parco.2021.102784 |