A Fast Dense Triangular Solve in CUDA

The level 2 BLAS operation _trsv performs a dense triangular solve and is often used in the solve phase of a direct solver following a matrix factorization. With the advent of manycore architectures reducing the cost of compute-bound parts of the computation, memory-bound operations such as this ker...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SIAM journal on scientific computing 2013-01, Vol.35 (3), p.C303-C322
1. Verfasser:	Hogg, J D
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Architecture Computation Cost engineering Factorization Kernels Optimization Run time (computers) Solvers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The level 2 BLAS operation _trsv performs a dense triangular solve and is often used in the solve phase of a direct solver following a matrix factorization. With the advent of manycore architectures reducing the cost of compute-bound parts of the computation, memory-bound operations such as this kernel become increasingly important. This is particularly noticeable in sparse direct solvers used for optimization applications where multiple memory-bound solves follow each (traditionally expensive) compute-bound factorization. In this paper, a high performance implementation of the triangular solve is developed through an analysis of theoretical and practical bounds on its run time. This implementation outperforms the CUBLAS by a factor of 5--15. [PUBLICATION ABSTRACT]
ISSN:	1064-8275 1095-7197
DOI:	10.1137/12088358X