Scaling LAPACK Panel Operations Using Parallel Cache Assignment

In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high performance Level 3 BLAS. The Level 3 BLAS have excellent scaling, but panel processing tends to be bus bound, and thus scale...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on mathematical software 2013-07, Vol.39 (4), p.1-30
Hauptverfasser:	CASTALDO, Anthony M, WHALEY, R. Clint, SAMUEL, Siju
Format:	Artikel
Sprache:	eng
Schlagworte:	Algebra Algorithmics. Computability. Computer arithmetics Algorithms Applied sciences Computer science control theory systems Exact sciences and technology Linear and multilinear algebra, matrix theory Mathematics Sciences and techniques of general use Software Software engineering Theoretical computing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high performance Level 3 BLAS. The Level 3 BLAS have excellent scaling, but panel processing tends to be bus bound, and thus scales with bus speed rather than the number of processors ( p ). Amdahl's law therefore ensures that as p grows, the panel computation will become the dominant cost of these LAPACK routines. Our contribution is a novel parallel cache assignment approach to the panel factorization which we show scales well with p . We apply this general approach to the QR, QL, RQ, LQ and LU panel factorizations. We show results for two commodity platforms: an 8-core Intel platform and a 32-core AMD platform. For both platforms and all twenty implementations (five factorizations each of which is available in 4 types), we present results that demonstrate that our approach yields significant speedup over the existing state of the art.
ISSN:	0098-3500 1557-7295
DOI:	10.1145/2491491.2491492