OctCNN: A High Throughput FPGA Accelerator for CNNs using Octave Convolution Algorithm

With the rapid development of convolutional neural networks (CNNs), FPGAs have become one of the most attractive candidates for deploying CNNs. However, previous FPGA solutions based on the traditional convolution are still limited by computational power. In this article, we introduce the octave con...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computers 2022-01, Vol.71 (8), p.1-1
Hauptverfasser:	Lou, Wenqi, Gong, Lei, Wang, Chao, Du, Zidong, Xuehai, Zhou
Format:	Artikel
Sprache:	eng
Schlagworte:	accelerators Artificial neural networks Computational modeling Computer architecture Convolution Convolutional neural networks Design optimization design space exploration Field programmable gate arrays FPGA Greedy algorithms Hardware Heuristic methods Kernel Multilayers octave convolution Power consumption Prototypes Search algorithms Signal processing algorithms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue	8
container_start_page	1
container_title	IEEE transactions on computers
container_volume	71
creator	Lou, Wenqi Gong, Lei Wang, Chao Du, Zidong Xuehai, Zhou
description	With the rapid development of convolutional neural networks (CNNs), FPGAs have become one of the most attractive candidates for deploying CNNs. However, previous FPGA solutions based on the traditional convolution are still limited by computational power. In this article, we introduce the octave convolution (OctConv) into the CNN accelerator design for the first time to improve the hardware acceleration efficiency and design a dedicated OctPU for mapping OctConv to FPGAs, which employs a parallel dataflow pattern to exploit the parallelism of OctConv. Then, we present a novel and scalable architecture that dynamically combines the inter-layer pipelined structure and multi-layer reuse structure. Meanwhile, to obtain the optimized solution, we build a multidimensional performance and resource analysis model and a two-stage search algorithm based on greedy and heuristic algorithms. We evaluate our proposal by implementing VGG16 and ResNet50 on the Xilinx VU9P FPGA. Experimental results show that our prototypes can achieve an average of 3321 GOP/s for the convolutional layers for VGG16 and 2873 GOP/s for the overall ResNet50 using OctConv. Compared to previous works based on the traditional convolution, our prototypes own a 1.72 to 2.33 speedup in throughput and a 2.01 to 5.18 improvement in computational density. Our design also presents an excellent compromise performance and generalization
doi_str_mv	10.1109/TC.2021.3110413
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9531411</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9531411</ieee_id><sourcerecordid>2686301309</sourcerecordid><originalsourceid>FETCH-LOGICAL-c219t-e7f892b01f77cc61b62997fcc3530596a56af018dc59dcbb0160db8849d1966b3</originalsourceid><addsrcrecordid>eNo9kD1rwzAQhkVpoWnauUMXQWcnJ8uSrW7GNEkhJB3crsKW5Y-QWKlkB_rvq5DQ4TgOnvfueBB6JjAjBMQ8z2YhhGRG_RQReoMmhLE4EILxWzQBIEkgaAT36MG5HQDwEMQEfW_VkG02bzjFq65pcd5aMzbtcRzw4nOZ4lQpvde2GIzFtS_POjy6rm-wTxYnjTPTn8x-HDrT43TfGNsN7eER3dXF3umna5-ir8V7nq2C9Xb5kaXrQIVEDIGO60SEJZA6jpXipOShEHGtFGUUmOAF40XtP68UE5UqPcihKpMkEhURnJd0il4ve4_W_IzaDXJnRtv7kzLkCadAKAhPzS-UssY5q2t5tN2hsL-SgDzLk3kmz_LkVZ5PvFwSndb6nxaMkogQ-gdnLmiY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2686301309</pqid></control><display><type>article</type><title>OctCNN: A High Throughput FPGA Accelerator for CNNs using Octave Convolution Algorithm</title><source>IEEE Electronic Library (IEL)</source><creator>Lou, Wenqi ; Gong, Lei ; Wang, Chao ; Du, Zidong ; Xuehai, Zhou</creator><creatorcontrib>Lou, Wenqi ; Gong, Lei ; Wang, Chao ; Du, Zidong ; Xuehai, Zhou</creatorcontrib><description>With the rapid development of convolutional neural networks (CNNs), FPGAs have become one of the most attractive candidates for deploying CNNs. However, previous FPGA solutions based on the traditional convolution are still limited by computational power. In this article, we introduce the octave convolution (OctConv) into the CNN accelerator design for the first time to improve the hardware acceleration efficiency and design a dedicated OctPU for mapping OctConv to FPGAs, which employs a parallel dataflow pattern to exploit the parallelism of OctConv. Then, we present a novel and scalable architecture that dynamically combines the inter-layer pipelined structure and multi-layer reuse structure. Meanwhile, to obtain the optimized solution, we build a multidimensional performance and resource analysis model and a two-stage search algorithm based on greedy and heuristic algorithms. We evaluate our proposal by implementing VGG16 and ResNet50 on the Xilinx VU9P FPGA. Experimental results show that our prototypes can achieve an average of 3321 GOP/s for the convolutional layers for VGG16 and 2873 GOP/s for the overall ResNet50 using OctConv. Compared to previous works based on the traditional convolution, our prototypes own a 1.72 to 2.33 speedup in throughput and a 2.01 to 5.18 improvement in computational density. Our design also presents an excellent compromise performance and generalization</description><identifier>ISSN: 0018-9340</identifier><identifier>EISSN: 1557-9956</identifier><identifier>DOI: 10.1109/TC.2021.3110413</identifier><identifier>CODEN: ITCOB4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>accelerators ; Artificial neural networks ; Computational modeling ; Computer architecture ; Convolution ; Convolutional neural networks ; Design optimization ; design space exploration ; Field programmable gate arrays ; FPGA ; Greedy algorithms ; Hardware ; Heuristic methods ; Kernel ; Multilayers ; octave convolution ; Power consumption ; Prototypes ; Search algorithms ; Signal processing algorithms</subject><ispartof>IEEE transactions on computers, 2022-01, Vol.71 (8), p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c219t-e7f892b01f77cc61b62997fcc3530596a56af018dc59dcbb0160db8849d1966b3</citedby><cites>FETCH-LOGICAL-c219t-e7f892b01f77cc61b62997fcc3530596a56af018dc59dcbb0160db8849d1966b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9531411$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9531411$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lou, Wenqi</creatorcontrib><creatorcontrib>Gong, Lei</creatorcontrib><creatorcontrib>Wang, Chao</creatorcontrib><creatorcontrib>Du, Zidong</creatorcontrib><creatorcontrib>Xuehai, Zhou</creatorcontrib><title>OctCNN: A High Throughput FPGA Accelerator for CNNs using Octave Convolution Algorithm</title><title>IEEE transactions on computers</title><addtitle>TC</addtitle><description>With the rapid development of convolutional neural networks (CNNs), FPGAs have become one of the most attractive candidates for deploying CNNs. However, previous FPGA solutions based on the traditional convolution are still limited by computational power. In this article, we introduce the octave convolution (OctConv) into the CNN accelerator design for the first time to improve the hardware acceleration efficiency and design a dedicated OctPU for mapping OctConv to FPGAs, which employs a parallel dataflow pattern to exploit the parallelism of OctConv. Then, we present a novel and scalable architecture that dynamically combines the inter-layer pipelined structure and multi-layer reuse structure. Meanwhile, to obtain the optimized solution, we build a multidimensional performance and resource analysis model and a two-stage search algorithm based on greedy and heuristic algorithms. We evaluate our proposal by implementing VGG16 and ResNet50 on the Xilinx VU9P FPGA. Experimental results show that our prototypes can achieve an average of 3321 GOP/s for the convolutional layers for VGG16 and 2873 GOP/s for the overall ResNet50 using OctConv. Compared to previous works based on the traditional convolution, our prototypes own a 1.72 to 2.33 speedup in throughput and a 2.01 to 5.18 improvement in computational density. Our design also presents an excellent compromise performance and generalization</description><subject>accelerators</subject><subject>Artificial neural networks</subject><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Convolution</subject><subject>Convolutional neural networks</subject><subject>Design optimization</subject><subject>design space exploration</subject><subject>Field programmable gate arrays</subject><subject>FPGA</subject><subject>Greedy algorithms</subject><subject>Hardware</subject><subject>Heuristic methods</subject><subject>Kernel</subject><subject>Multilayers</subject><subject>octave convolution</subject><subject>Power consumption</subject><subject>Prototypes</subject><subject>Search algorithms</subject><subject>Signal processing algorithms</subject><issn>0018-9340</issn><issn>1557-9956</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kD1rwzAQhkVpoWnauUMXQWcnJ8uSrW7GNEkhJB3crsKW5Y-QWKlkB_rvq5DQ4TgOnvfueBB6JjAjBMQ8z2YhhGRG_RQReoMmhLE4EILxWzQBIEkgaAT36MG5HQDwEMQEfW_VkG02bzjFq65pcd5aMzbtcRzw4nOZ4lQpvde2GIzFtS_POjy6rm-wTxYnjTPTn8x-HDrT43TfGNsN7eER3dXF3umna5-ir8V7nq2C9Xb5kaXrQIVEDIGO60SEJZA6jpXipOShEHGtFGUUmOAF40XtP68UE5UqPcihKpMkEhURnJd0il4ve4_W_IzaDXJnRtv7kzLkCadAKAhPzS-UssY5q2t5tN2hsL-SgDzLk3kmz_LkVZ5PvFwSndb6nxaMkogQ-gdnLmiY</recordid><startdate>20220101</startdate><enddate>20220101</enddate><creator>Lou, Wenqi</creator><creator>Gong, Lei</creator><creator>Wang, Chao</creator><creator>Du, Zidong</creator><creator>Xuehai, Zhou</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20220101</creationdate><title>OctCNN: A High Throughput FPGA Accelerator for CNNs using Octave Convolution Algorithm</title><author>Lou, Wenqi ; Gong, Lei ; Wang, Chao ; Du, Zidong ; Xuehai, Zhou</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c219t-e7f892b01f77cc61b62997fcc3530596a56af018dc59dcbb0160db8849d1966b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>accelerators</topic><topic>Artificial neural networks</topic><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Convolution</topic><topic>Convolutional neural networks</topic><topic>Design optimization</topic><topic>design space exploration</topic><topic>Field programmable gate arrays</topic><topic>FPGA</topic><topic>Greedy algorithms</topic><topic>Hardware</topic><topic>Heuristic methods</topic><topic>Kernel</topic><topic>Multilayers</topic><topic>octave convolution</topic><topic>Power consumption</topic><topic>Prototypes</topic><topic>Search algorithms</topic><topic>Signal processing algorithms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lou, Wenqi</creatorcontrib><creatorcontrib>Gong, Lei</creatorcontrib><creatorcontrib>Wang, Chao</creatorcontrib><creatorcontrib>Du, Zidong</creatorcontrib><creatorcontrib>Xuehai, Zhou</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lou, Wenqi</au><au>Gong, Lei</au><au>Wang, Chao</au><au>Du, Zidong</au><au>Xuehai, Zhou</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>OctCNN: A High Throughput FPGA Accelerator for CNNs using Octave Convolution Algorithm</atitle><jtitle>IEEE transactions on computers</jtitle><stitle>TC</stitle><date>2022-01-01</date><risdate>2022</risdate><volume>71</volume><issue>8</issue><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>0018-9340</issn><eissn>1557-9956</eissn><coden>ITCOB4</coden><abstract>With the rapid development of convolutional neural networks (CNNs), FPGAs have become one of the most attractive candidates for deploying CNNs. However, previous FPGA solutions based on the traditional convolution are still limited by computational power. In this article, we introduce the octave convolution (OctConv) into the CNN accelerator design for the first time to improve the hardware acceleration efficiency and design a dedicated OctPU for mapping OctConv to FPGAs, which employs a parallel dataflow pattern to exploit the parallelism of OctConv. Then, we present a novel and scalable architecture that dynamically combines the inter-layer pipelined structure and multi-layer reuse structure. Meanwhile, to obtain the optimized solution, we build a multidimensional performance and resource analysis model and a two-stage search algorithm based on greedy and heuristic algorithms. We evaluate our proposal by implementing VGG16 and ResNet50 on the Xilinx VU9P FPGA. Experimental results show that our prototypes can achieve an average of 3321 GOP/s for the convolutional layers for VGG16 and 2873 GOP/s for the overall ResNet50 using OctConv. Compared to previous works based on the traditional convolution, our prototypes own a 1.72 to 2.33 speedup in throughput and a 2.01 to 5.18 improvement in computational density. Our design also presents an excellent compromise performance and generalization</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TC.2021.3110413</doi><tpages>1</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0018-9340
ispartof	IEEE transactions on computers, 2022-01, Vol.71 (8), p.1-1
issn	0018-9340 1557-9956
language	eng
recordid	cdi_ieee_primary_9531411
source	IEEE Electronic Library (IEL)
subjects	accelerators Artificial neural networks Computational modeling Computer architecture Convolution Convolutional neural networks Design optimization design space exploration Field programmable gate arrays FPGA Greedy algorithms Hardware Heuristic methods Kernel Multilayers octave convolution Power consumption Prototypes Search algorithms Signal processing algorithms
title	OctCNN: A High Throughput FPGA Accelerator for CNNs using Octave Convolution Algorithm
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T17%3A02%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=OctCNN:%20A%20High%20Throughput%20FPGA%20Accelerator%20for%20CNNs%20using%20Octave%20Convolution%20Algorithm&rft.jtitle=IEEE%20transactions%20on%20computers&rft.au=Lou,%20Wenqi&rft.date=2022-01-01&rft.volume=71&rft.issue=8&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=0018-9340&rft.eissn=1557-9956&rft.coden=ITCOB4&rft_id=info:doi/10.1109/TC.2021.3110413&rft_dat=%3Cproquest_RIE%3E2686301309%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2686301309&rft_id=info:pmid/&rft_ieee_id=9531411&rfr_iscdi=true