A CNN Inference Accelerator on FPGA With Compression and Layer-Chaining Techniques for Style Transfer Applications

Recently, convolutional neural networks (CNNs) have actively been applied to computer vision applications such as style transfer that changes the style of a content image into that of a style image. As the style transfer CNNs are based on encoder-decoder network architecture and should deal with hig...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2023-04, Vol.70 (4), p.1-14
Hauptverfasser:	Kim, Suchang, Jang, Boseon, Lee, Jaeyoung, Bae, Hyungjoon, Jang, Hyejung, Park, In-Cheol
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Chaining Chips (memory devices) Coders Complexity compression Computer architecture Computer vision Convolution Convolutional neural network (CNN) Convolutional neural networks Data compression Encoders-Decoders Feature maps field programmable gate array (FPGA) Field programmable gate arrays Hardware Image resolution Inference Labeling neural processing unit (NPU) style transfer application Superresolution Throughput
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recently, convolutional neural networks (CNNs) have actively been applied to computer vision applications such as style transfer that changes the style of a content image into that of a style image. As the style transfer CNNs are based on encoder-decoder network architecture and should deal with high-resolution images that become mainstream these days, the computational complexity and the feature map size are very large, preventing the CNNs from being implemented on an FPGA. This paper proposes a CNN inference accelerator for the style transfer applications, which employs network compression and layer-chaining techniques. The network compression technique is to make a style transfer CNN have low computational complexity and a small amount of parameters, and an efficient data compression method is proposed to reduce the feature map size. In addition, the layer-chaining technique is proposed to reduce the off-chip memory traffic and thus to increase the throughput at the cost of small hardware resources. In the proposed hardware architecture, a neural processing unit is designed by taking into account the proposed data compression and layer-chaining techniques. A prototype accelerator implemented on a FPGA board achieves a throughput comparable to the state-of-the-art accelerators developed for encoder-decoder CNNs.
ISSN:	1549-8328 1558-0806
DOI:	10.1109/TCSI.2023.3234640