Resource-Efficient Optimization for FPGA-Based Convolution Accelerator

Convolution forms one of the most essential operations for the FPGA-based hardware accelerator. However, the existing designs often neglect the inherent architecture of FPGA, which puts forward an austere challenge on hardware resource. Even though some previous works have proposed approximate multi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Electronics (Basel) 2023-10, Vol.12 (20), p.4333
Hauptverfasser:	Ma, Yanhua, Xu, Qican, Song, Zerui
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Convolution Design and construction Digital integrated circuits Energy efficiency Field programmable gate arrays Hardware Mathematical optimization Methods Multiplication Multipliers Neural networks Optimization Performance degradation Reduction Resource allocation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	20
container_start_page	4333
container_title	Electronics (Basel)
container_volume	12
creator	Ma, Yanhua Xu, Qican Song, Zerui
description	Convolution forms one of the most essential operations for the FPGA-based hardware accelerator. However, the existing designs often neglect the inherent architecture of FPGA, which puts forward an austere challenge on hardware resource. Even though some previous works have proposed approximate multipliers or convolution acceleration algorithms to deal with this issue, the inevitable accuracy loss and resource occupation easily lead to performance degradation. Toward this, we first propose two kinds of resource-efficient optimized accurate multipliers based on LUTs or carry chains. Then, targeting FPGA-based platforms, a generic multiply–accumulate structure is constructed by directly accumulating the partial products produced by our proposed optimized radix-4 Booth multipliers without intermediate multiplication and addition results. Experimental results demonstrate that our proposed multiplier achieves a maximum 51% look-up-table (LUT) reduction compared to the Vivado area-optimized multiplier IP. Furthermore, the convolutional process unit using the proposed structure achieves a 36% LUT reduction compared to existing methods. As case studies, the proposed method is applied to DCT transform, LeNet, and MobileNet-V3 to achieve hardware resource saving without loss of accuracy.
doi_str_mv	10.3390/electronics12204333
format	Article
fullrecord	<record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2882557290</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A772062580</galeid><sourcerecordid>A772062580</sourcerecordid><originalsourceid>FETCH-LOGICAL-c311t-e78da7082ac5a4c1d471f0a93fd047428fea805de73e1188874eaa8df295534b3</originalsourceid><addsrcrecordid>eNptUE1LAzEQDaJgqf0FXhY8b83XmuS4ln4IhYroeYnJRFK2m5qkgv56U-vBgzOHGWbemzc8hK4JnjKm8C30YHIMgzeJUIo5Y-wMjSgWqlZU0fM__SWapLTFJRRhkuERWjxBCodooJ47542HIVebffY7_6WzD0PlQqwWj8u2vtcJbDULw0foDz-r1piiHXUO8QpdON0nmPzWMXpZzJ9nq3q9WT7M2nVtGCG5BiGtFlhSbRrNDbFcEIe1Ys5iLjiVDrTEjQXBgBAppeCgtbSOqqZh_JWN0c3p7j6G9wOk3G3L90OR7KiUtGkEVbigpifUm-6h84MLOWpT0sLOmzCA82XeCkHxHW3kkcBOBBNDShFct49-p-NnR3B3NLn7x2T2Df2Uceg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2882557290</pqid></control><display><type>article</type><title>Resource-Efficient Optimization for FPGA-Based Convolution Accelerator</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Ma, Yanhua ; Xu, Qican ; Song, Zerui</creator><creatorcontrib>Ma, Yanhua ; Xu, Qican ; Song, Zerui</creatorcontrib><description>Convolution forms one of the most essential operations for the FPGA-based hardware accelerator. However, the existing designs often neglect the inherent architecture of FPGA, which puts forward an austere challenge on hardware resource. Even though some previous works have proposed approximate multipliers or convolution acceleration algorithms to deal with this issue, the inevitable accuracy loss and resource occupation easily lead to performance degradation. Toward this, we first propose two kinds of resource-efficient optimized accurate multipliers based on LUTs or carry chains. Then, targeting FPGA-based platforms, a generic multiply–accumulate structure is constructed by directly accumulating the partial products produced by our proposed optimized radix-4 Booth multipliers without intermediate multiplication and addition results. Experimental results demonstrate that our proposed multiplier achieves a maximum 51% look-up-table (LUT) reduction compared to the Vivado area-optimized multiplier IP. Furthermore, the convolutional process unit using the proposed structure achieves a 36% LUT reduction compared to existing methods. As case studies, the proposed method is applied to DCT transform, LeNet, and MobileNet-V3 to achieve hardware resource saving without loss of accuracy.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics12204333</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Accuracy ; Algorithms ; Convolution ; Design and construction ; Digital integrated circuits ; Energy efficiency ; Field programmable gate arrays ; Hardware ; Mathematical optimization ; Methods ; Multiplication ; Multipliers ; Neural networks ; Optimization ; Performance degradation ; Reduction ; Resource allocation</subject><ispartof>Electronics (Basel), 2023-10, Vol.12 (20), p.4333</ispartof><rights>COPYRIGHT 2023 MDPI AG</rights><rights>2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c311t-e78da7082ac5a4c1d471f0a93fd047428fea805de73e1188874eaa8df295534b3</cites><orcidid>0000-0002-2254-3896</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,778,782,27911,27912</link.rule.ids></links><search><creatorcontrib>Ma, Yanhua</creatorcontrib><creatorcontrib>Xu, Qican</creatorcontrib><creatorcontrib>Song, Zerui</creatorcontrib><title>Resource-Efficient Optimization for FPGA-Based Convolution Accelerator</title><title>Electronics (Basel)</title><description>Convolution forms one of the most essential operations for the FPGA-based hardware accelerator. However, the existing designs often neglect the inherent architecture of FPGA, which puts forward an austere challenge on hardware resource. Even though some previous works have proposed approximate multipliers or convolution acceleration algorithms to deal with this issue, the inevitable accuracy loss and resource occupation easily lead to performance degradation. Toward this, we first propose two kinds of resource-efficient optimized accurate multipliers based on LUTs or carry chains. Then, targeting FPGA-based platforms, a generic multiply–accumulate structure is constructed by directly accumulating the partial products produced by our proposed optimized radix-4 Booth multipliers without intermediate multiplication and addition results. Experimental results demonstrate that our proposed multiplier achieves a maximum 51% look-up-table (LUT) reduction compared to the Vivado area-optimized multiplier IP. Furthermore, the convolutional process unit using the proposed structure achieves a 36% LUT reduction compared to existing methods. As case studies, the proposed method is applied to DCT transform, LeNet, and MobileNet-V3 to achieve hardware resource saving without loss of accuracy.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Convolution</subject><subject>Design and construction</subject><subject>Digital integrated circuits</subject><subject>Energy efficiency</subject><subject>Field programmable gate arrays</subject><subject>Hardware</subject><subject>Mathematical optimization</subject><subject>Methods</subject><subject>Multiplication</subject><subject>Multipliers</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Performance degradation</subject><subject>Reduction</subject><subject>Resource allocation</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNptUE1LAzEQDaJgqf0FXhY8b83XmuS4ln4IhYroeYnJRFK2m5qkgv56U-vBgzOHGWbemzc8hK4JnjKm8C30YHIMgzeJUIo5Y-wMjSgWqlZU0fM__SWapLTFJRRhkuERWjxBCodooJ47542HIVebffY7_6WzD0PlQqwWj8u2vtcJbDULw0foDz-r1piiHXUO8QpdON0nmPzWMXpZzJ9nq3q9WT7M2nVtGCG5BiGtFlhSbRrNDbFcEIe1Ys5iLjiVDrTEjQXBgBAppeCgtbSOqqZh_JWN0c3p7j6G9wOk3G3L90OR7KiUtGkEVbigpifUm-6h84MLOWpT0sLOmzCA82XeCkHxHW3kkcBOBBNDShFct49-p-NnR3B3NLn7x2T2Df2Uceg</recordid><startdate>20231001</startdate><enddate>20231001</enddate><creator>Ma, Yanhua</creator><creator>Xu, Qican</creator><creator>Song, Zerui</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0002-2254-3896</orcidid></search><sort><creationdate>20231001</creationdate><title>Resource-Efficient Optimization for FPGA-Based Convolution Accelerator</title><author>Ma, Yanhua ; Xu, Qican ; Song, Zerui</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c311t-e78da7082ac5a4c1d471f0a93fd047428fea805de73e1188874eaa8df295534b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Convolution</topic><topic>Design and construction</topic><topic>Digital integrated circuits</topic><topic>Energy efficiency</topic><topic>Field programmable gate arrays</topic><topic>Hardware</topic><topic>Mathematical optimization</topic><topic>Methods</topic><topic>Multiplication</topic><topic>Multipliers</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Performance degradation</topic><topic>Reduction</topic><topic>Resource allocation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ma, Yanhua</creatorcontrib><creatorcontrib>Xu, Qican</creatorcontrib><creatorcontrib>Song, Zerui</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ma, Yanhua</au><au>Xu, Qican</au><au>Song, Zerui</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Resource-Efficient Optimization for FPGA-Based Convolution Accelerator</atitle><jtitle>Electronics (Basel)</jtitle><date>2023-10-01</date><risdate>2023</risdate><volume>12</volume><issue>20</issue><spage>4333</spage><pages>4333-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>Convolution forms one of the most essential operations for the FPGA-based hardware accelerator. However, the existing designs often neglect the inherent architecture of FPGA, which puts forward an austere challenge on hardware resource. Even though some previous works have proposed approximate multipliers or convolution acceleration algorithms to deal with this issue, the inevitable accuracy loss and resource occupation easily lead to performance degradation. Toward this, we first propose two kinds of resource-efficient optimized accurate multipliers based on LUTs or carry chains. Then, targeting FPGA-based platforms, a generic multiply–accumulate structure is constructed by directly accumulating the partial products produced by our proposed optimized radix-4 Booth multipliers without intermediate multiplication and addition results. Experimental results demonstrate that our proposed multiplier achieves a maximum 51% look-up-table (LUT) reduction compared to the Vivado area-optimized multiplier IP. Furthermore, the convolutional process unit using the proposed structure achieves a 36% LUT reduction compared to existing methods. As case studies, the proposed method is applied to DCT transform, LeNet, and MobileNet-V3 to achieve hardware resource saving without loss of accuracy.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics12204333</doi><orcidid>https://orcid.org/0000-0002-2254-3896</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2079-9292
ispartof	Electronics (Basel), 2023-10, Vol.12 (20), p.4333
issn	2079-9292 2079-9292
language	eng
recordid	cdi_proquest_journals_2882557290
source	MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
subjects	Accuracy Algorithms Convolution Design and construction Digital integrated circuits Energy efficiency Field programmable gate arrays Hardware Mathematical optimization Methods Multiplication Multipliers Neural networks Optimization Performance degradation Reduction Resource allocation
title	Resource-Efficient Optimization for FPGA-Based Convolution Accelerator
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T03%3A35%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Resource-Efficient%20Optimization%20for%20FPGA-Based%20Convolution%20Accelerator&rft.jtitle=Electronics%20(Basel)&rft.au=Ma,%20Yanhua&rft.date=2023-10-01&rft.volume=12&rft.issue=20&rft.spage=4333&rft.pages=4333-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics12204333&rft_dat=%3Cgale_proqu%3EA772062580%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2882557290&rft_id=info:pmid/&rft_galeid=A772062580&rfr_iscdi=true