An FPGA-Based Residual Recurrent Neural Network for Real-Time Video Super-Resolution
In this paper, we propose a hardware-efficient residual recurrent neural network for real-time video super-resolution (VSR) based on field programmable gate array (FPGA). Although recent learning-based VSR methods have achieved remarkable performance, the large computational complexity prohibits the...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2022-04, Vol.32 (4), p.1739-1750 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we propose a hardware-efficient residual recurrent neural network for real-time video super-resolution (VSR) based on field programmable gate array (FPGA). Although recent learning-based VSR methods have achieved remarkable performance, the large computational complexity prohibits the deployment of the sophisticated VSR models on FPGA for real-time applications. Limited by the hardware resources, state-of-the-art FPGA-based VSR methods perform single-image super-resolution over the video sequence and suffer from temporal inconsistency. In order to exploit the inter-frame temporal correlation for real-time VSR on low-complexity hardware, we introduce a hardware-efficient recurrent neural network ERVSR. Specially, the proposed ERVSR leverages the input frame and the temporal information entailed in the hidden state to reconstruct the high-resolution counterpart. To reduce the network parameters, the low-resolution input branch and the hidden state branch are convolved individually and a channel modulation coefficient is proposed to explicitly guide the network to allocate the amount of output feature channels to each branch. Additionally, in order to reduce the memory consumption, we perform a dedicated lightweight compression of the hidden state by introducing a statistical normalization scheme followed by a fixed-point quantization. Besides, we adopt group convolution and depthwise separable convolution to further compact the network. We evaluated the proposed ERVSR on multiple public datasets from different aspects. Experimental results demonstrate that ERVSR performs better than the existing state-of-the-art FPGA-based VSR methods in both image quality and data throughput. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2021.3080241 |