High-Level Strategies for Parallel Shared-Memory Sparse Matrix-Vector Multiplication

The sparse matrix-vector multiplication is an important computational kernel, but is hard to efficiently execute even in the sequential case. The problems--namely low arithmetic intensity, inefficient cache use, and limited memory bandwidth--are magnified as the core count on shared-memory parallel...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2014-01, Vol.25 (1), p.116-125
Hauptverfasser:	Yzelman, Albert-Jan Nicholas, Roose, Dirk
Format:	Artikel
Sprache:	eng
Schlagworte:	Bandwidth cache-oblivious Computer architecture high-performance computing Hilbert space-filling curve Indexes Kernel matrix reordering NUMA architectures Particle separators shared-memory parallelism Sparse matrices sparse matrix partitioning Sparse matrix-vector multiplication Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The sparse matrix-vector multiplication is an important computational kernel, but is hard to efficiently execute even in the sequential case. The problems--namely low arithmetic intensity, inefficient cache use, and limited memory bandwidth--are magnified as the core count on shared-memory parallel architectures increases. Existing techniques are discussed in detail, and categorized chiefly based on their distribution types. Based on this, new parallelization techniques are proposed. The theoretical scalability and memory usage of the various strategies are analyzed, and experiments on multiple NUMA architectures confirm the validity of the results. One of the newly proposed methods attains the best average result in experiments on a large set of matrices. In one of the experiments it obtains a parallel efficiency of 90 percent, while on average it performs close to 60 percent.
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2013.31