Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming

As high-end computing systems continue to grow in scale, recent advances in multi- and many-core architectures have pushed such growth toward more dense architectures, that is, more processing elements per physical node, rather than more physical nodes themselves. Although a large number of scientif...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Int. J. High Perform. Comput. Appl 2010-02, Vol.24 (1), p.49-57
Hauptverfasser:	Balaji, Pavan, Buntinas, Darius, Goodell, David, Gropp, William, Thakur, Rajeev
Format:	Artikel
Sprache:	eng
Schlagworte:	Application programming interface Coarsening Computation Computer architecture Computer programs Computer simulation COMPUTERS GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE High performance computing IMPLEMENTATION Locks Mathematical models Messages Parallel processing PROCESSING PROGRAMMING Studies Threaded
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	As high-end computing systems continue to grow in scale, recent advances in multi- and many-core architectures have pushed such growth toward more dense architectures, that is, more processing elements per physical node, rather than more physical nodes themselves. Although a large number of scientific applications have relied so far on an MPI-everywhere model for programming high-end parallel systems; this model may not be sufficient for future machines, given their physical constraints such as decreasing amounts of memory per processing element and shared caches. As a result, application and computer scientists are exploring alternative programming models that involve using MPI between address spaces and some other threaded model, such as OpenMP, Pthreads, or Intel TBB, within an address space. Such hybrid models require efficient support from an MPI implementation for MPI messages sent from multiple threads simultaneously. In this paper, we explore the issues involved in designing such an implementation. We present four approaches to building a fully thread-safe MPI implementation, with decreasing levels of critical-section granularity (from coarse-grain locks to fine-grain locks to lock-free operations) and correspondingly increasing levels of complexity. We present performance results that demonstrate the performance implications of the different approaches.
ISSN:	1094-3420 1741-2846
DOI:	10.1177/1094342009360206