S a C/C formulations of the all‐pairs N ‐body problem and their performance on SMPs and GPGPUs

This paper describes our experience in implementing the classical N ‐body algorithm in S a C and analysing the runtime performance achieved on three different machines: a dual‐processor 8‐core Dell PowerEdge 2950 (a Beowulf cluster node, the reference machine), a quad‐core hyper‐threaded Intel Core‐...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Concurrency and computation 2014-03, Vol.26 (4), p.952-971
Hauptverfasser:	Šinkarovs, Artjoms, Scholz, Sven‐Bodo, Bernecky, Robert, Douma, Roeland, Grelck, Clemens
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper describes our experience in implementing the classical N ‐body algorithm in S a C and analysing the runtime performance achieved on three different machines: a dual‐processor 8‐core Dell PowerEdge 2950 (a Beowulf cluster node, the reference machine), a quad‐core hyper‐threaded Intel Core‐i7 based system equipped with an NVidia GTX‐480 graphics accelerator and an Oracle Sparc T4‐4 server with a total of 256 hardware threads. We contrast our findings with those resulting from the reference C code and a few variants of it that employ OpenMP pragmas as well as explicit vectorisation. Our experiments demonstrate that the S a C implementation successfully combines a high level of abstraction, very close to the mathematical specification, with very competitive runtimes. In fact, S a C matches or outperforms the hand‐vectorised and hand‐parallelised C codes on all three systems under investigation without the need for any source code modification. Furthermore, only S a C is able to effectively harness the advanced compute power of the graphics accelerator, again by mere recompilation of the same source code. Our results illustrate the benefits that S a C provides to application programmers in terms of coding productivity, source code, and performance portability among different machine architectures, as well as long‐term maintainability in evolving hardware environments. Copyright © 2013 John Wiley & Sons, Ltd.
ISSN:	1532-0626 1532-0634
DOI:	10.1002/cpe.3078