Efficient hardware‐accelerated pseudoinverse computation through algorithm restructuring for parallelization in high‐level synthesis
Summary This paper describes a fast and efficient hardware‐accelerated pseudoinverse computation through algorithm restructuring and leveraging FPGA synthesis directives for parallelism prior to high‐level synthesis (HLS). The algorithm, which is composed of modified Gram–Schmidt QR decomposition (M...
Gespeichert in:
Veröffentlicht in: | International journal of circuit theory and applications 2022-02, Vol.50 (2), p.394-416 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Summary
This paper describes a fast and efficient hardware‐accelerated pseudoinverse computation through algorithm restructuring and leveraging FPGA synthesis directives for parallelism prior to high‐level synthesis (HLS). The algorithm, which is composed of modified Gram–Schmidt QR decomposition (MGS‐QRD), triangular matrix inversion (TMI), and matrix multiplication (MM), is synthesized and implemented on a field‐programmable gate array (FPGA). MGS‐QRD is restructured and augmented with parallelism directives prior to synthesizing the algorithm, which yielded an MGS‐QRD hardware accelerator with high throughput. Modifications to the current TMI algorithm were also proposed, in which the removal of redundant computational tasks was done in order to speed up overall operation. Data dependencies in the MM algorithm were carefully considered such that appropriate parallelism directives were inserted, and matching the data flow of MM with MGS‐QRD and TMI modules was also performed to accelerate the pseudoinverse computation. The results showed that the proposed pseudoinverse module is better than the naïve implementation which is composed of existing MGS‐QRD, TMI and a standard MM in terms of maximum frequency (1.24× speedup), hardware resources(48% of reduction of DSP usage), latency (23% reduction), and throughput (62% increase).
This paper introduced an optimized algorithm structure for the modified Gram–Schmidt QR decomposition (MGS‐QRD) algorithm which leads to a pseudoinverse computation accelerator with high throughput. An integration of MGS‐QRD, triangular matrix inversion and matrix multiplication using loop optimization techniques and redundant matrix elements removal leads to more efficient pseudoinverse computation. |
---|---|
ISSN: | 0098-9886 1097-007X |
DOI: | 10.1002/cta.3155 |