A High-Performance Scalable Shared-Memory SVD Processor Architecture Based on Jacobi Algorithm and Batcher's Sorting Network

Eigenvalue Decomposition (EVD) and Singular Value Decomposition (SVD) are two crucial transformations in many signal processing applications. The main drawback of these algorithms is their computationally intensive nature which prevents them to be efficiently exploited in high-performance, real-time...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2020-06, Vol.67 (6), p.1912-1924
Hauptverfasser: Shahshahani, Seyed Mohamad Reza, Mahdiani, Hamid Reza
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Eigenvalue Decomposition (EVD) and Singular Value Decomposition (SVD) are two crucial transformations in many signal processing applications. The main drawback of these algorithms is their computationally intensive nature which prevents them to be efficiently exploited in high-performance, real-time and mobile applications. By extracting the inherent parallelism of the Jacobi SVD, a new parallel data distribution and access pattern for this algorithm is proposed first. Based on the proposed parallel data distribution, a novel shared-memory architecture is then proposed to support EVD/SVD computation in a high-performance and scalable manner. A new Multistage Interconnection Network based on Batcher's odd-even merge sorting network is developed and exploited in the architecture to preserve its performance and scalability by simultaneously connecting different numbers of processing elements to the system memory hierarchy in a parallel conflict-free manner. The proposed architecture can be configured to compute EVD/SVD of matrices of arbitrary size, with different numbers of processing elements achieving a linear speed-up. The synthesis results in a 90 nm technology show that the system with one, two, and four processing elements achieves a throughput of 1.81, 3.63, and 7.26 million EVD/SVD's per second, respectively with a frequency of 813 MHz for an 8\times 8 matrix.
ISSN:1549-8328
1558-0806
DOI:10.1109/TCSI.2020.2973249