gSoFa: Scalable Sparse Symbolic LU Factorization on GPUs

Decomposing a matrix \mathbf {A} A into a lower matrix \mathbf {L} L and an upper matrix \mathbf {U} U , which is also known as LU decomposition, is an essential operation in numerical linear algebra. For a sparse matrix, LU decomposition often introduces more nonzero entries in the \mathbf {L} L...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2022-04, Vol.33 (4), p.1015-1026
Hauptverfasser:	Gaihre, Anil, Li, Xiaoye Sherry, Liu, Hang
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Data structures Decomposition Design optimization Factorization Graphics processing units Image edge detection Linear algebra LU decomposition Mathematical analysis MATHEMATICS AND COMPUTING Matrix algebra Matrix decomposition Memory management Parallel processing Sparse linear algebra sparse linear solvers Sparse matrices Sparsity static symbolic factorization on GPU
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Decomposing a matrix \mathbf {A} A into a lower matrix \mathbf {L} L and an upper matrix \mathbf {U} U , which is also known as LU decomposition, is an essential operation in numerical linear algebra. For a sparse matrix, LU decomposition often introduces more nonzero entries in the \mathbf {L} L and \mathbf {U} U factors than in the original matrix. A symbolic factorization step is needed to identify the nonzero structures of \mathbf {L} L and \mathbf {U} U matrices. Attracted by the enormous potentials of the Graphics Processing Units (GPUs), an array of efforts have surged to deploy various LU factorization steps except for the symbolic factorization, to the best of our knowledge, on GPUs. This article introduces gSoFa , the first G PU-based s ymb o lic fa ctorization design with the following three optimizations to enable scalable LU symbolic factorization for nonsymmetric pattern sparse matrices on GPUs. First, we introduce a novel fine-grained parallel symbolic factorization algorithm that is well suited for the Single Instruction Multiple Thread (SIMT) architecture of GPUs. Second, we tailor supernode detection into a SIMT friendly process and strive to balance the workload, minimize the commun
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2021.3090316