A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAs

In the digital era, the prevalence of low-quality images contrasts with the widespread use of high-definition displays, primarily due to low-resolution cameras and compression technologies. Image super-resolution (SR) techniques, particularly those leveraging deep learning, aim to enhance these imag...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on design automation of electronic systems 2024-05, Vol.29 (3), p.1-25, Article 53
Hauptverfasser:	Liu, Hongduo, Qian, Yijian, Liang, Youqiang, Zhang, Bin, Liu, Zhaohan, He, Tao, Zhao, Wenqian, Lu, Jiangbo, Yu, Bei
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer systems organization Embedded hardware
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In the digital era, the prevalence of low-quality images contrasts with the widespread use of high-definition displays, primarily due to low-resolution cameras and compression technologies. Image super-resolution (SR) techniques, particularly those leveraging deep learning, aim to enhance these images for high-definition presentation. However, real-time execution of deep neural network (DNN)-based SR methods at the edge poses challenges due to their high computational and storage requirements. To address this, field-programmable gate arrays (FPGAs) have emerged as a promising platform, offering flexibility, programmability, and adaptability to evolving models. Previous FPGA-based SR solutions have focused on reducing computational and memory costs through aggressive simplification techniques, often sacrificing the quality of the reconstructed images. This paper introduces a novel SR network specifically designed for edge applications, which maintains reconstruction performance while managing computation costs effectively. Additionally, we propose an architectural design that enables the real-time and end-to-end inference of the proposed SR network on embedded FPGAs. Our key contributions include a tailored SR algorithm optimized for embedded FPGAs, a DSP-enhanced design that achieves a significant four-fold speedup, a novel scalable cache strategy for handling large feature maps, optimization of DSP cascade consumption, and a constraint optimization approach for resource allocation. Experimental results demonstrate that our FPGA-specific accelerator surpasses existing solutions, delivering superior throughput, energy efficiency, and image quality.
ISSN:	1084-4309 1557-7309
DOI:	10.1145/3652855