Flexible task‐DAG management in PHAST library: Data‐parallel tasks and orchestration support for heterogeneous systems
Summary Heterogeneous architectures proved successful in achieving unprecedented performance and energy‐efficiency. However, taking advantage of these diverse processing elements is still hard. Programmers need to code through the different approaches suitable for each target architecture and need t...
Gespeichert in:
Veröffentlicht in: | Concurrency and computation 2022-01, Vol.34 (2), p.n/a |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Summary
Heterogeneous architectures proved successful in achieving unprecedented performance and energy‐efficiency. However, taking advantage of these diverse processing elements is still hard. Programmers need to code through the different approaches suitable for each target architecture and need to decide the distribution of activities on the different resources. The majority of current frameworks focuses on either performance or productivity. The former mainly provides low‐level target‐specific programming interfaces, and the latter offers high‐level tools that often fail in achieving high‐performance. In both cases, the design is usually data‐parallel, as task‐parallelism is not supported. In this work, we propose a task‐based solution within the data‐parallel heterogeneous single‐source PHAST library. Tasks can be coded in a target‐agnostic fashion, can be compiled and parallelized on multi‐core CPUs and NVIDIA GPUs automatically and support the choice of the execution platform at runtime. We evaluate the capabilities of the proposed task‐directed acyclic graph support in case of an extensive set of randomly generated task‐based applications with different sizes and characteristics. We compare it against a SYCL implementation in terms of performance and complexity metrics, highlighting that PHAST achieves about 1.56× and 2.60× speedup over SYCL for multi‐core CPU and GPU, respectively, while improving also code complexity metrics. |
---|---|
ISSN: | 1532-0626 1532-0634 |
DOI: | 10.1002/cpe.5842 |