DCP-CNN: Efficient Acceleration of CNNs With Dynamic Computing Parallelism on FPGA
Convolutional Neural Networks (CNNs) have demonstrated outstanding accuracy among a range of machine learning tasks. However, the huge computational overhead limits their deployability in real-time applications. For this reason, parallel computing has been extensively employed to accelerate CNNs in...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on computer-aided design of integrated circuits and systems 2024-07, p.1-1 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE transactions on computer-aided design of integrated circuits and systems |
container_volume | |
creator | Dai, Kui Xie, Zheren Liu, Shuanglong |
description | Convolutional Neural Networks (CNNs) have demonstrated outstanding accuracy among a range of machine learning tasks. However, the huge computational overhead limits their deployability in real-time applications. For this reason, parallel computing has been extensively employed to accelerate CNNs in parallel computing devices such as GPUs and FPGAs, by unrolling multiple loop operations of convolutional layers. Nevertheless, existing CNN accelerators can hardly exploit different parallelisms offered by the CNN algorithms efficiently, since their degrees of parallelism are fixed at different dimensions and layers. In this paper, we propose the DCP-CNN, an FPGA-based CNN accelerator which implements the CNN with Dynamic Computing Parallelism degrees. DCP-CNN employs a parallel computing architecture which dynamically allocates the computing resources between different data dimensions of each layer based on layer size, to ensure that all computing units are working to full capacity and thus achieve optimal compute efficiency. Furthermore, in order to boost the performance of throughput, we propose a design space exploration (DSE) framework based on the simulated annealing method, which automatically generates the parallelism degrees between different dimensions of the network layers, according to the resource constraints and CNN structure. On Intel Stratix 10 GX650 FPGA, the proposed DCP-CNN achieves the throughput of more than 800 Gop/s and the compute efficiency of 72% ~ 98%, which outperforms the existing state-of-the-art FPGA-based CNN accelerators. |
doi_str_mv | 10.1109/TCAD.2024.3435996 |
format | Article |
fullrecord | <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TCAD_2024_3435996</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10616169</ieee_id><sourcerecordid>10_1109_TCAD_2024_3435996</sourcerecordid><originalsourceid>FETCH-LOGICAL-c639-2c2bcc46fbc78295edc11d573e6fb031dae3b2a6245ff3a428af58a3ed538963</originalsourceid><addsrcrecordid>eNpNkMtKw0AUhgdRsFYfQHAxL5A6Zy5Jxl1ILwqlBi24DNPJGR3JpWTiom_flHYhZ_HD-S-Lj5BHYDMApp-3eTafccblTEihtI6vyAS0SCIJCq7JhPEkjRhL2C25C-GXMZCK6wn5mOdFlG82L3ThnLce24Fm1mKNvRl819LO0dEO9MsPP3R-aE3jLc27Zv83-PabFqY3dY21Dw0d08tild2TG2fqgA8XnZLP5WKbv0br99Vbnq0jGwsdcct31srY7WyScq2wsgCVSgSOLyagMih23MRcKueEkTw1TqVGYKVEqmMxJXBetX0XQo-u3Pe-Mf2hBFaekJQnJOUJSXlBMnaezh2PiP_yMYynxRG7D1w6</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>DCP-CNN: Efficient Acceleration of CNNs With Dynamic Computing Parallelism on FPGA</title><source>IEEE Xplore</source><creator>Dai, Kui ; Xie, Zheren ; Liu, Shuanglong</creator><creatorcontrib>Dai, Kui ; Xie, Zheren ; Liu, Shuanglong</creatorcontrib><description>Convolutional Neural Networks (CNNs) have demonstrated outstanding accuracy among a range of machine learning tasks. However, the huge computational overhead limits their deployability in real-time applications. For this reason, parallel computing has been extensively employed to accelerate CNNs in parallel computing devices such as GPUs and FPGAs, by unrolling multiple loop operations of convolutional layers. Nevertheless, existing CNN accelerators can hardly exploit different parallelisms offered by the CNN algorithms efficiently, since their degrees of parallelism are fixed at different dimensions and layers. In this paper, we propose the DCP-CNN, an FPGA-based CNN accelerator which implements the CNN with Dynamic Computing Parallelism degrees. DCP-CNN employs a parallel computing architecture which dynamically allocates the computing resources between different data dimensions of each layer based on layer size, to ensure that all computing units are working to full capacity and thus achieve optimal compute efficiency. Furthermore, in order to boost the performance of throughput, we propose a design space exploration (DSE) framework based on the simulated annealing method, which automatically generates the parallelism degrees between different dimensions of the network layers, according to the resource constraints and CNN structure. On Intel Stratix 10 GX650 FPGA, the proposed DCP-CNN achieves the throughput of more than 800 Gop/s and the compute efficiency of 72% ~ 98%, which outperforms the existing state-of-the-art FPGA-based CNN accelerators.</description><identifier>ISSN: 0278-0070</identifier><identifier>EISSN: 1937-4151</identifier><identifier>DOI: 10.1109/TCAD.2024.3435996</identifier><identifier>CODEN: ITCSDI</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computer architecture ; Convolution ; Convolutional codes ; Convolutional neural networks ; Convolutional Neural Networks (CNNs) ; Field programmable gate arrays ; Field Programmable Gate Arrays (FPGAs) ; Hardware Accelerator ; Kernel ; Loop Unrolling ; Parallel Computing ; Parallel processing</subject><ispartof>IEEE transactions on computer-aided design of integrated circuits and systems, 2024-07, p.1-1</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-1513-1981</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10616169$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10616169$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dai, Kui</creatorcontrib><creatorcontrib>Xie, Zheren</creatorcontrib><creatorcontrib>Liu, Shuanglong</creatorcontrib><title>DCP-CNN: Efficient Acceleration of CNNs With Dynamic Computing Parallelism on FPGA</title><title>IEEE transactions on computer-aided design of integrated circuits and systems</title><addtitle>TCAD</addtitle><description>Convolutional Neural Networks (CNNs) have demonstrated outstanding accuracy among a range of machine learning tasks. However, the huge computational overhead limits their deployability in real-time applications. For this reason, parallel computing has been extensively employed to accelerate CNNs in parallel computing devices such as GPUs and FPGAs, by unrolling multiple loop operations of convolutional layers. Nevertheless, existing CNN accelerators can hardly exploit different parallelisms offered by the CNN algorithms efficiently, since their degrees of parallelism are fixed at different dimensions and layers. In this paper, we propose the DCP-CNN, an FPGA-based CNN accelerator which implements the CNN with Dynamic Computing Parallelism degrees. DCP-CNN employs a parallel computing architecture which dynamically allocates the computing resources between different data dimensions of each layer based on layer size, to ensure that all computing units are working to full capacity and thus achieve optimal compute efficiency. Furthermore, in order to boost the performance of throughput, we propose a design space exploration (DSE) framework based on the simulated annealing method, which automatically generates the parallelism degrees between different dimensions of the network layers, according to the resource constraints and CNN structure. On Intel Stratix 10 GX650 FPGA, the proposed DCP-CNN achieves the throughput of more than 800 Gop/s and the compute efficiency of 72% ~ 98%, which outperforms the existing state-of-the-art FPGA-based CNN accelerators.</description><subject>Computer architecture</subject><subject>Convolution</subject><subject>Convolutional codes</subject><subject>Convolutional neural networks</subject><subject>Convolutional Neural Networks (CNNs)</subject><subject>Field programmable gate arrays</subject><subject>Field Programmable Gate Arrays (FPGAs)</subject><subject>Hardware Accelerator</subject><subject>Kernel</subject><subject>Loop Unrolling</subject><subject>Parallel Computing</subject><subject>Parallel processing</subject><issn>0278-0070</issn><issn>1937-4151</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtKw0AUhgdRsFYfQHAxL5A6Zy5Jxl1ILwqlBi24DNPJGR3JpWTiom_flHYhZ_HD-S-Lj5BHYDMApp-3eTafccblTEihtI6vyAS0SCIJCq7JhPEkjRhL2C25C-GXMZCK6wn5mOdFlG82L3ThnLce24Fm1mKNvRl819LO0dEO9MsPP3R-aE3jLc27Zv83-PabFqY3dY21Dw0d08tild2TG2fqgA8XnZLP5WKbv0br99Vbnq0jGwsdcct31srY7WyScq2wsgCVSgSOLyagMih23MRcKueEkTw1TqVGYKVEqmMxJXBetX0XQo-u3Pe-Mf2hBFaekJQnJOUJSXlBMnaezh2PiP_yMYynxRG7D1w6</recordid><startdate>20240730</startdate><enddate>20240730</enddate><creator>Dai, Kui</creator><creator>Xie, Zheren</creator><creator>Liu, Shuanglong</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-1513-1981</orcidid></search><sort><creationdate>20240730</creationdate><title>DCP-CNN: Efficient Acceleration of CNNs With Dynamic Computing Parallelism on FPGA</title><author>Dai, Kui ; Xie, Zheren ; Liu, Shuanglong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c639-2c2bcc46fbc78295edc11d573e6fb031dae3b2a6245ff3a428af58a3ed538963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer architecture</topic><topic>Convolution</topic><topic>Convolutional codes</topic><topic>Convolutional neural networks</topic><topic>Convolutional Neural Networks (CNNs)</topic><topic>Field programmable gate arrays</topic><topic>Field Programmable Gate Arrays (FPGAs)</topic><topic>Hardware Accelerator</topic><topic>Kernel</topic><topic>Loop Unrolling</topic><topic>Parallel Computing</topic><topic>Parallel processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dai, Kui</creatorcontrib><creatorcontrib>Xie, Zheren</creatorcontrib><creatorcontrib>Liu, Shuanglong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dai, Kui</au><au>Xie, Zheren</au><au>Liu, Shuanglong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DCP-CNN: Efficient Acceleration of CNNs With Dynamic Computing Parallelism on FPGA</atitle><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle><stitle>TCAD</stitle><date>2024-07-30</date><risdate>2024</risdate><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>0278-0070</issn><eissn>1937-4151</eissn><coden>ITCSDI</coden><abstract>Convolutional Neural Networks (CNNs) have demonstrated outstanding accuracy among a range of machine learning tasks. However, the huge computational overhead limits their deployability in real-time applications. For this reason, parallel computing has been extensively employed to accelerate CNNs in parallel computing devices such as GPUs and FPGAs, by unrolling multiple loop operations of convolutional layers. Nevertheless, existing CNN accelerators can hardly exploit different parallelisms offered by the CNN algorithms efficiently, since their degrees of parallelism are fixed at different dimensions and layers. In this paper, we propose the DCP-CNN, an FPGA-based CNN accelerator which implements the CNN with Dynamic Computing Parallelism degrees. DCP-CNN employs a parallel computing architecture which dynamically allocates the computing resources between different data dimensions of each layer based on layer size, to ensure that all computing units are working to full capacity and thus achieve optimal compute efficiency. Furthermore, in order to boost the performance of throughput, we propose a design space exploration (DSE) framework based on the simulated annealing method, which automatically generates the parallelism degrees between different dimensions of the network layers, according to the resource constraints and CNN structure. On Intel Stratix 10 GX650 FPGA, the proposed DCP-CNN achieves the throughput of more than 800 Gop/s and the compute efficiency of 72% ~ 98%, which outperforms the existing state-of-the-art FPGA-based CNN accelerators.</abstract><pub>IEEE</pub><doi>10.1109/TCAD.2024.3435996</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-1513-1981</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0278-0070 |
ispartof | IEEE transactions on computer-aided design of integrated circuits and systems, 2024-07, p.1-1 |
issn | 0278-0070 1937-4151 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TCAD_2024_3435996 |
source | IEEE Xplore |
subjects | Computer architecture Convolution Convolutional codes Convolutional neural networks Convolutional Neural Networks (CNNs) Field programmable gate arrays Field Programmable Gate Arrays (FPGAs) Hardware Accelerator Kernel Loop Unrolling Parallel Computing Parallel processing |
title | DCP-CNN: Efficient Acceleration of CNNs With Dynamic Computing Parallelism on FPGA |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T06%3A43%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DCP-CNN:%20Efficient%20Acceleration%20of%20CNNs%20With%20Dynamic%20Computing%20Parallelism%20on%20FPGA&rft.jtitle=IEEE%20transactions%20on%20computer-aided%20design%20of%20integrated%20circuits%20and%20systems&rft.au=Dai,%20Kui&rft.date=2024-07-30&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=0278-0070&rft.eissn=1937-4151&rft.coden=ITCSDI&rft_id=info:doi/10.1109/TCAD.2024.3435996&rft_dat=%3Ccrossref_RIE%3E10_1109_TCAD_2024_3435996%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10616169&rfr_iscdi=true |