S a C/C formulations of the all‐pairs N ‐body problem and their performance on SMPs and GPGPUs

This paper describes our experience in implementing the classical N ‐body algorithm in S a C and analysing the runtime performance achieved on three different machines: a dual‐processor 8‐core Dell PowerEdge 2950 (a Beowulf cluster node, the reference machine), a quad‐core hyper‐threaded Intel Core‐...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Concurrency and computation 2014-03, Vol.26 (4), p.952-971
Hauptverfasser: Šinkarovs, Artjoms, Scholz, Sven‐Bodo, Bernecky, Robert, Douma, Roeland, Grelck, Clemens
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper describes our experience in implementing the classical N ‐body algorithm in S a C and analysing the runtime performance achieved on three different machines: a dual‐processor 8‐core Dell PowerEdge 2950 (a Beowulf cluster node, the reference machine), a quad‐core hyper‐threaded Intel Core‐i7 based system equipped with an NVidia GTX‐480 graphics accelerator and an Oracle Sparc T4‐4 server with a total of 256 hardware threads. We contrast our findings with those resulting from the reference C code and a few variants of it that employ OpenMP pragmas as well as explicit vectorisation. Our experiments demonstrate that the S a C implementation successfully combines a high level of abstraction, very close to the mathematical specification, with very competitive runtimes. In fact, S a C matches or outperforms the hand‐vectorised and hand‐parallelised C codes on all three systems under investigation without the need for any source code modification. Furthermore, only S a C is able to effectively harness the advanced compute power of the graphics accelerator, again by mere recompilation of the same source code. Our results illustrate the benefits that S a C provides to application programmers in terms of coding productivity, source code, and performance portability among different machine architectures, as well as long‐term maintainability in evolving hardware environments. Copyright © 2013 John Wiley & Sons, Ltd.
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.3078