Architectural support for parallel reductions in scalable shared-memory multiprocessors

Reductions are important and time-consuming operations in many scientific codes. Effective parallelization of reductions is a critical transformation for loop parallelization, especially for sparse, dynamic applications. Unfortunately, conventional reduction parallelization algorithms are not scalab...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Garzaran, M.J., Prvulovic, M., Ye Zhang, Jula, A., Hao Yu, Rauchwerger, L., Torrellas, J.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Commutation Concurrent computing Delay Merging Parallel algorithms Parallel machines Parallel processing Phased arrays Program processors Programming profession
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Reductions are important and time-consuming operations in many scientific codes. Effective parallelization of reductions is a critical transformation for loop parallelization, especially for sparse, dynamic applications. Unfortunately, conventional reduction parallelization algorithms are not scalable. In this paper, we present new architectural support that significantly speeds up parallel reduction and makes it scalable in shared-memory multiprocessors. The required architectural changes are mostly confined to the directory controllers. Experimental results based on simulations show that the proposed support is very effective. While conventional software-only reduction parallelization delivers average speedups of only 2.7 for 16 processors, our scheme delivers average speedups of 7.6.
ISSN:	1089-796X 1089-795X
DOI:	10.1109/PACT.2001.953304