GBC: An Energy-Efficient LSTM Accelerator With Gating Units Level Balanced Compression Strategy

Recurrent Neural Networks (RNNs) have emerged as one of the most popular neural networks for processing time-series problems, widely used in machine translation, automatic speech recognition, and other natural language processing applications. However, conventional RNNs suffered from vanishing and e...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2022-09, Vol.69 (9), p.3655-3665
Hauptverfasser:	Wu, Bi, Wang, Zhengkuan, Chen, Ke, Yan, Chenggang, Liu, Weiqiang
Format:	Artikel
Sprache:	eng
Schlagworte:	Automatic speech recognition Compression algorithms Correlation Energy efficiency Field programmable gate arrays FPGAs Logic gates LSTM Machine translation Natural language processing neural network compression Neural networks Recurrent neural networks RNN Task analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recurrent Neural Networks (RNNs) have emerged as one of the most popular neural networks for processing time-series problems, widely used in machine translation, automatic speech recognition, and other natural language processing applications. However, conventional RNNs suffered from vanishing and exploding gradients, resulting in poor network performance in applications with long-term input information. As a variant of RNN, Long Short-Term Memory (LSTM) had been proposed to tackle this issue. Nevertheless, at the same time, LSTM introduces gating units and many additional parameters, which makes it challenging to be implemented directly on resource-limited platforms, such as Field Programmable Gate Arrays (FPGAs). This work first investigated the overall maximum achievable compression rates of different gating units and their correlations. Then, Gating Units Level Balanced Compression (GBC) strategy is proposed. After Top- k pruning, the proposed GBC strategy can attain a compression rate of 36.6\times for LSTM. Further, the theoretical analysis indicates that for the existing gating units level LSTM compression variants, the GBC strategy still has further potential for compression. A complementary compression of the GBC strategy is performed on the existing coupled-gate LSTM to verify the analysis. Experimental results show that GBC achieves an additional 32\times (overall 42.7\times ) compression rate with negligible accuracy loss. Finally, hardware experiments conducted on Xilinx ADM-PCIE-7V3 FPGAs also demonstrate that the accelerator designed in this paper achieves an improvement of 7.4%-191.5% in energy efficiency compared to the state-of-the-art designs.
ISSN:	1549-8328 1558-0806
DOI:	10.1109/TCSI.2022.3181975