Parallel nested dissection

Nested dissection is a very popular direct method for solving sparse linear systems that arise from finite difference and finite element methods. Worley and Schreiber [16] give a fine grain algorithm for a square array of processors. Their algorithm uses O( N 2) processors, each with O( N) memory, t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Parallel computing 1990-12, Vol.16 (2), p.139-156
1. Verfasser:	Conroy, John M
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithmics. Computability. Computer arithmetics Applied sciences Block algorithm Cholesky factorization Computer science control theory systems Efficiency analysis Exact sciences and technology Mathematics Nested dissection Numerical analysis Numerical analysis. Scientific computation Numerical linear algebra Sciences and techniques of general use Sparse linear systems solver Theoretical computing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Nested dissection is a very popular direct method for solving sparse linear systems that arise from finite difference and finite element methods. Worley and Schreiber [16] give a fine grain algorithm for a square array of processors. Their algorithm uses O( N 2) processors, each with O( N) memory, to factor an N 2 by N 2 sparse matrix whose graphs is an N × N mesh. The efficiency of their method is between 1 46 and 1 12 . George et al. [6] [8] give a medium grain algorithm for hypercube architecture, while George et al. [7] give an algorithm for shared memory machines. These papers present a column oriented approach which can exploit O( N) parallelism and yield efficiencies up to 50%. Lucas [11] also gives a column oriented scheme which achieves up to 75% efficiency and O( N) parallelism. In this paper, we present a medium to fine grain algorithm for a P × P array of processors with local memory. This algorithm can exploit up to O( N 2) parallelism. The efficiency of the fine grain version is comparable to [16] while as a medium grain algorithm achieves about 49% efficiency. The strength of the method is due to three factors: its ability to pipeline much of the computation, overlapping computation and communication, and the use of level 3 BLAS like primitives. In addition to its high efficiency its memory requirement is optimal, only O(N 2 log N P 2 ) words memory is needed per processor.
ISSN:	0167-8191 1872-7336
DOI:	10.1016/0167-8191(90)90054-D