Couillard: Parallel programming via coarse-grained Data-flow Compilation

•Definition of THLL (TALM High-Level Language) and implementation of the Couillard Compiler for THLL.•Addition of naive placement generation on Couillard, which is enough for most regular applications.•Creation of a placement algorithm based on list scheduling, to be used on more complex application...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Parallel computing 2014-12, Vol.40 (10), p.661-680
Hauptverfasser: Marzulo, Leandro A.J., Alves, Tiago A.O., França, Felipe M.G., Costa, Vítor Santos
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Definition of THLL (TALM High-Level Language) and implementation of the Couillard Compiler for THLL.•Addition of naive placement generation on Couillard, which is enough for most regular applications.•Creation of a placement algorithm based on list scheduling, to be used on more complex applications.•Addition of work-stealing mechanism to the Trebuchet Runtime environment, based on the ABP algorithm.•Performance evaluation of Couillard/Trebuchet using four applications: Ray-tracer, Needleman–Wunsch, Blackscholes and Ferret. Data-flow is a natural approach to parallelism. However, describing dependencies and control between fine-grained data-flow tasks can be complex and present unwanted overheads. TALM (TALM is an Architecture and Language for Multi-threading) introduces a user-defined coarse-grained parallel data-flow model, where programmers identify code blocks, called super-instructions, to be run in parallel and connect them in a data-flow graph. TALM has been implemented as a hybrid Von Neumann/data-flow execution system: the Trebuchet. We have observed that TALM’s usefulness largely depends on how programmers specify and connect super-instructions. Thus, we present Couillard, a full compiler that creates, based on an annotated C-program, a data-flow graph and C-code corresponding to each super-instruction. We show that our toolchain allows one to benefit from data-flow execution and explore sophisticated parallel programming techniques, with small effort. To evaluate our system we have executed a set of real applications on a large multi-core machine. Comparison with popular parallel programming methods shows competitive speedups, while providing an easier parallel programing approach. More specifically, for an application that follows the wavefront method, running with big inputs, Trebuchet achieved up to 4.7% speedup over Intel® TBB novel flow-graph approach and up to 44% over OpenMP.
ISSN:0167-8191
1872-7336
DOI:10.1016/j.parco.2014.10.002