DAGuE: A generic distributed DAG engine for High Performance Computing

► We propose a DAG based engine for High Performance Computing. ► We describe the input language and tools of the productivity framework. ► DAG multicore and distributed scheduling is asynchronous and dynamic. ► Many possible target applications, including dense linear algebra factorizations. ► Perf...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Parallel computing 2012-01, Vol.38 (1), p.37-51
Hauptverfasser:	Bosilca, George, Bouteiller, Aurelien, Danalis, Anthony, Herault, Thomas, Lemarinier, Pierre, Dongarra, Jack
Format:	Artikel
Sprache:	eng
Schlagworte:	Architecture aware scheduling Communities Computation Factorization Heterogeneous architectures HPC Micro-task DAG Programming environments Scheduling State of the art Strain Tasks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	► We propose a DAG based engine for High Performance Computing. ► We describe the input language and tools of the productivity framework. ► DAG multicore and distributed scheduling is asynchronous and dynamic. ► Many possible target applications, including dense linear algebra factorizations. ► Performance of the DAGuE system outpaces ScaLAPACK and competes with HPL. The frenetic development of the current architectures places a strain on the current state-of-the-art programming environments. Harnessing the full potential of such architectures is a tremendous task for the whole scientific computing community. We present DAGuE a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be expressed as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact, problem-size independent format that can be queried on-demand to discover data dependencies, in a totally distributed fashion. DAGuE assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority. We demonstrate the efficiency of our approach, using several micro-benchmarks to analyze the performance of different components of the framework, and a linear algebra factorization as a use case.
ISSN:	0167-8191 1872-7336
DOI:	10.1016/j.parco.2011.10.003