gSoFa: Scalable Sparse Symbolic LU Factorization on GPUs
Decomposing a matrix \mathbf {A} A into a lower matrix \mathbf {L} L and an upper matrix \mathbf {U} U , which is also known as LU decomposition, is an essential operation in numerical linear algebra. For a sparse matrix, LU decomposition often introduces more nonzero entries in the \mathbf {L} L...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2022-04, Vol.33 (4), p.1015-1026 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Decomposing a matrix \mathbf {A} A into a lower matrix \mathbf {L} L and an upper matrix \mathbf {U} U , which is also known as LU decomposition, is an essential operation in numerical linear algebra. For a sparse matrix, LU decomposition often introduces more nonzero entries in the \mathbf {L} L and \mathbf {U} U factors than in the original matrix. A symbolic factorization step is needed to identify the nonzero structures of \mathbf {L} L and \mathbf {U} U matrices. Attracted by the enormous potentials of the Graphics Processing Units (GPUs), an array of efforts have surged to deploy various LU factorization steps except for the symbolic factorization, to the best of our knowledge, on GPUs. This article introduces gSoFa , the first G PU-based s ymb o lic fa ctorization design with the following three optimizations to enable scalable LU symbolic factorization for nonsymmetric pattern sparse matrices on GPUs. First, we introduce a novel fine-grained parallel symbolic factorization algorithm that is well suited for the Single Instruction Multiple Thread (SIMT) architecture of GPUs. Second, we tailor supernode detection into a SIMT friendly process and strive to balance the workload, minimize the commun |
---|---|
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2021.3090316 |