Efficient hardware‐accelerated pseudoinverse computation through algorithm restructuring for parallelization in high‐level synthesis

Summary This paper describes a fast and efficient hardware‐accelerated pseudoinverse computation through algorithm restructuring and leveraging FPGA synthesis directives for parallelism prior to high‐level synthesis (HLS). The algorithm, which is composed of modified Gram–Schmidt QR decomposition (M...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of circuit theory and applications 2022-02, Vol.50 (2), p.394-416
Hauptverfasser:	Tan, Chong Yeam, Ooi, Chia Yee, Choo, Hau Sim, Ismail, Nordinah
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computational efficiency Field programmable gate arrays Gram–Schmidt QR decomposition Hardware high‐level synthesis loop optimization Matrices (mathematics) Modules Multiplication Parallel processing pseudoinverse Reduction Synthesis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Summary This paper describes a fast and efficient hardware‐accelerated pseudoinverse computation through algorithm restructuring and leveraging FPGA synthesis directives for parallelism prior to high‐level synthesis (HLS). The algorithm, which is composed of modified Gram–Schmidt QR decomposition (MGS‐QRD), triangular matrix inversion (TMI), and matrix multiplication (MM), is synthesized and implemented on a field‐programmable gate array (FPGA). MGS‐QRD is restructured and augmented with parallelism directives prior to synthesizing the algorithm, which yielded an MGS‐QRD hardware accelerator with high throughput. Modifications to the current TMI algorithm were also proposed, in which the removal of redundant computational tasks was done in order to speed up overall operation. Data dependencies in the MM algorithm were carefully considered such that appropriate parallelism directives were inserted, and matching the data flow of MM with MGS‐QRD and TMI modules was also performed to accelerate the pseudoinverse computation. The results showed that the proposed pseudoinverse module is better than the naïve implementation which is composed of existing MGS‐QRD, TMI and a standard MM in terms of maximum frequency (1.24× speedup), hardware resources(48% of reduction of DSP usage), latency (23% reduction), and throughput (62% increase). This paper introduced an optimized algorithm structure for the modified Gram–Schmidt QR decomposition (MGS‐QRD) algorithm which leads to a pseudoinverse computation accelerator with high throughput. An integration of MGS‐QRD, triangular matrix inversion and matrix multiplication using loop optimization techniques and redundant matrix elements removal leads to more efficient pseudoinverse computation.
ISSN:	0098-9886 1097-007X
DOI:	10.1002/cta.3155