Performance Portability of a GPU Enabled Factorization with the DAGuE Framework

Performance portability is a major challenge faced today by developers on heterogeneous high performance computers, consisting of an interconnect, memory with non-uniform access, many-cores and accelerators like GPUs. Recent studies have successfully demonstrated that dense linear algebra operations...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Bosilca, G., Bouteiller, A., Herault, T., Lemarinier, P., Saengpatsa, N. O., Tomov, S., Dongarra, J. J.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Performance portability is a major challenge faced today by developers on heterogeneous high performance computers, consisting of an interconnect, memory with non-uniform access, many-cores and accelerators like GPUs. Recent studies have successfully demonstrated that dense linear algebra operations can be efficiently handled by runtime systems using a DAG representation. In this work, we present the GPU subsystem of the DAGuE runtime, and assess, on the Cholesky factorization test case, the minimal efforts required by a programmer to enable GPU acceleration in the DAGuE framework. The performance achieved by this unchanged code, on a variety of heterogeneous and distributed many cores and GPU resources, demonstrates the desired performance portability.
ISSN:1552-5244
2168-9253
DOI:10.1109/CLUSTER.2011.51