Flexible task‐DAG management in PHAST library: Data‐parallel tasks and orchestration support for heterogeneous systems

Summary Heterogeneous architectures proved successful in achieving unprecedented performance and energy‐efficiency. However, taking advantage of these diverse processing elements is still hard. Programmers need to code through the different approaches suitable for each target architecture and need t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Concurrency and computation 2022-01, Vol.34 (2), p.n/a
Hauptverfasser:	Peccerillo, Biagio, Bartolini, Sandro
Format:	Artikel
Sprache:	eng
Schlagworte:	Central processing units Complexity complexity metrics CPUs data‐parallelism Graphics processing units heterogeneous computing Libraries single‐source parallel programming task‐based parallelism
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Summary Heterogeneous architectures proved successful in achieving unprecedented performance and energy‐efficiency. However, taking advantage of these diverse processing elements is still hard. Programmers need to code through the different approaches suitable for each target architecture and need to decide the distribution of activities on the different resources. The majority of current frameworks focuses on either performance or productivity. The former mainly provides low‐level target‐specific programming interfaces, and the latter offers high‐level tools that often fail in achieving high‐performance. In both cases, the design is usually data‐parallel, as task‐parallelism is not supported. In this work, we propose a task‐based solution within the data‐parallel heterogeneous single‐source PHAST library. Tasks can be coded in a target‐agnostic fashion, can be compiled and parallelized on multi‐core CPUs and NVIDIA GPUs automatically and support the choice of the execution platform at runtime. We evaluate the capabilities of the proposed task‐directed acyclic graph support in case of an extensive set of randomly generated task‐based applications with different sizes and characteristics. We compare it against a SYCL implementation in terms of performance and complexity metrics, highlighting that PHAST achieves about 1.56× and 2.60× speedup over SYCL for multi‐core CPU and GPU, respectively, while improving also code complexity metrics.
ISSN:	1532-0626 1532-0634
DOI:	10.1002/cpe.5842