Optimization of Scatter Network Architectures and Bank Allocations for Sparse CNN Accelerators

Sparse convolutional neural network (SCNN) accelerators eliminate unnecessary computations and memory access by exploiting zero-valued activation pixels and filter weights. However, data movement between the multiplier array and accumulator buffer tends to be a performance bottleneck. Specifically,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2022, Vol.10, p.1-1
Hauptverfasser:	Kim, Sunwoo, Park, Sungkyung, Park, Chester Sungchung
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerator Accelerators Accumulators Algorithms Allocations Arbitration Artificial neural networks Buffers Cartesian coordinates Computer architecture Convolutional neural networks convolutional neural networks (CNNs) cycle-accurate simulator Data compression dataflow Logic synthesis Network architecture network on a chip (NoC) Network topologies Optimization Performance degradation Performance enhancement Queueing Resource management Scattering System-on-chip Workload Workloads
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue
container_start_page	1
container_title	IEEE access
container_volume	10
creator	Kim, Sunwoo Park, Sungkyung Park, Chester Sungchung
description	Sparse convolutional neural network (SCNN) accelerators eliminate unnecessary computations and memory access by exploiting zero-valued activation pixels and filter weights. However, data movement between the multiplier array and accumulator buffer tends to be a performance bottleneck. Specifically, the scatter network, which is the core block of SCNN accelerators, delivers Cartesian products to the accumulator buffer, and certain products are not immediately delivered owing to bus contention. A previous SCNN-based architecture eliminates bus contention and improves the performance significantly by making use of different dataflows. However, it relies only on weight sparsity, and its performance is highly dependent on the workload. In this paper, we propose a novel scatter network architecture for SCNN accelerators. First, we propose network topologies (such as window and split queuing), which define the connection between the FIFOs and crossbar buses in the scatter network. Second, we investigate arbitration algorithms (such as fixed priority, round-robin, and longest-queue-first), which define the priorities of the products delivered to the accumulator buffer. However, the optimization of the scatter network architecture alone may not be able to provide sufficient performance gain since it does not help to reduce bus contention itself. In this paper, we propose a cubic-constrained bank allocation for the accumulator buffer, which reduces bus contention without any increase in the hardware area. Based on the results of cycle-accurate simulation, register-transfer-level (RTL) design, and logic synthesis, this study investigates the trade-off between the performance and complexity of SCNN accelerators. In detail, it is verified that, when the optimized SCNN accelerators are applied to AlexNet, the proposed scatter network architecture can remove most of the performance degradation due to bus contention, thereby improving the accelerator performance by 72%, with an area increase of 18%. It is also shown that the proposed bank allocation provides an additional performance gain of up to 31% when it is applied to SqueezeNet. The proposed scatter network architectures and bank allocation can eliminate bus contention in most Cartesian product-based accelerators, regardless of the workload, without changing accelerator components other than the scatter network.
doi_str_mv	10.1109/ACCESS.2022.3199010
format	Article
fullrecord	<record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2704098645</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9857922</ieee_id><doaj_id>oai_doaj_org_article_7a74c275913a4a02b4dc095fcaab1df8</doaj_id><sourcerecordid>2704098645</sourcerecordid><originalsourceid>FETCH-LOGICAL-c313t-ec88d7f007a547f1b22f3d653f7c9b31398b0bd6a31bd16b5527fce8176240003</originalsourceid><addsrcrecordid>eNpNUctOwzAQjBBIIOgX9GKJc4ofcRwfQ8RLQuUQuGJtHBtS0rrYrhB8PaZBiL3sajQzu6vJsjnBC0KwvKib5qptFxRTumBESkzwQXZCSSlzxll5-G8-zmYhrHCqKkFcnGTPD9s4rIcviIPbIGdRqyFG49HSxA_n31Dt9esQjY47bwKCTY8uYZPgcXR6LwrIOo_aLfhgULNcolprMxoP0flwlh1ZGIOZ_fbT7On66rG5ze8fbu6a-j7XjLCYG11VvbAYC-CFsKSj1LK-5MwKLbtEkVWHu74ERrqelB3nVFhtKiJKWqRv2Gl2N_n2DlZq64c1-E_lYFB7wPkXBT4OejRKgCg0FVwSBgVg2hW9xpJbDdCR3lbJ63zy2nr3vjMhqpXb-U06X1GBCyyrsuCJxSaW9i4Eb-zfVoLVTy5qykX95KJ-c0mq-aQajDF_CllxISll37kviNA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2704098645</pqid></control><display><type>article</type><title>Optimization of Scatter Network Architectures and Bank Allocations for Sparse CNN Accelerators</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Kim, Sunwoo ; Park, Sungkyung ; Park, Chester Sungchung</creator><creatorcontrib>Kim, Sunwoo ; Park, Sungkyung ; Park, Chester Sungchung</creatorcontrib><description>Sparse convolutional neural network (SCNN) accelerators eliminate unnecessary computations and memory access by exploiting zero-valued activation pixels and filter weights. However, data movement between the multiplier array and accumulator buffer tends to be a performance bottleneck. Specifically, the scatter network, which is the core block of SCNN accelerators, delivers Cartesian products to the accumulator buffer, and certain products are not immediately delivered owing to bus contention. A previous SCNN-based architecture eliminates bus contention and improves the performance significantly by making use of different dataflows. However, it relies only on weight sparsity, and its performance is highly dependent on the workload. In this paper, we propose a novel scatter network architecture for SCNN accelerators. First, we propose network topologies (such as window and split queuing), which define the connection between the FIFOs and crossbar buses in the scatter network. Second, we investigate arbitration algorithms (such as fixed priority, round-robin, and longest-queue-first), which define the priorities of the products delivered to the accumulator buffer. However, the optimization of the scatter network architecture alone may not be able to provide sufficient performance gain since it does not help to reduce bus contention itself. In this paper, we propose a cubic-constrained bank allocation for the accumulator buffer, which reduces bus contention without any increase in the hardware area. Based on the results of cycle-accurate simulation, register-transfer-level (RTL) design, and logic synthesis, this study investigates the trade-off between the performance and complexity of SCNN accelerators. In detail, it is verified that, when the optimized SCNN accelerators are applied to AlexNet, the proposed scatter network architecture can remove most of the performance degradation due to bus contention, thereby improving the accelerator performance by 72%, with an area increase of 18%. It is also shown that the proposed bank allocation provides an additional performance gain of up to 31% when it is applied to SqueezeNet. The proposed scatter network architectures and bank allocation can eliminate bus contention in most Cartesian product-based accelerators, regardless of the workload, without changing accelerator components other than the scatter network.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2022.3199010</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accelerator ; Accelerators ; Accumulators ; Algorithms ; Allocations ; Arbitration ; Artificial neural networks ; Buffers ; Cartesian coordinates ; Computer architecture ; Convolutional neural networks ; convolutional neural networks (CNNs) ; cycle-accurate simulator ; Data compression ; dataflow ; Logic synthesis ; Network architecture ; network on a chip (NoC) ; Network topologies ; Optimization ; Performance degradation ; Performance enhancement ; Queueing ; Resource management ; Scattering ; System-on-chip ; Workload ; Workloads</subject><ispartof>IEEE access, 2022, Vol.10, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c313t-ec88d7f007a547f1b22f3d653f7c9b31398b0bd6a31bd16b5527fce8176240003</cites><orcidid>0000-0003-1171-5020 ; 0000-0003-2009-2814 ; 0000-0002-8753-5969</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9857922$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Kim, Sunwoo</creatorcontrib><creatorcontrib>Park, Sungkyung</creatorcontrib><creatorcontrib>Park, Chester Sungchung</creatorcontrib><title>Optimization of Scatter Network Architectures and Bank Allocations for Sparse CNN Accelerators</title><title>IEEE access</title><addtitle>Access</addtitle><description>Sparse convolutional neural network (SCNN) accelerators eliminate unnecessary computations and memory access by exploiting zero-valued activation pixels and filter weights. However, data movement between the multiplier array and accumulator buffer tends to be a performance bottleneck. Specifically, the scatter network, which is the core block of SCNN accelerators, delivers Cartesian products to the accumulator buffer, and certain products are not immediately delivered owing to bus contention. A previous SCNN-based architecture eliminates bus contention and improves the performance significantly by making use of different dataflows. However, it relies only on weight sparsity, and its performance is highly dependent on the workload. In this paper, we propose a novel scatter network architecture for SCNN accelerators. First, we propose network topologies (such as window and split queuing), which define the connection between the FIFOs and crossbar buses in the scatter network. Second, we investigate arbitration algorithms (such as fixed priority, round-robin, and longest-queue-first), which define the priorities of the products delivered to the accumulator buffer. However, the optimization of the scatter network architecture alone may not be able to provide sufficient performance gain since it does not help to reduce bus contention itself. In this paper, we propose a cubic-constrained bank allocation for the accumulator buffer, which reduces bus contention without any increase in the hardware area. Based on the results of cycle-accurate simulation, register-transfer-level (RTL) design, and logic synthesis, this study investigates the trade-off between the performance and complexity of SCNN accelerators. In detail, it is verified that, when the optimized SCNN accelerators are applied to AlexNet, the proposed scatter network architecture can remove most of the performance degradation due to bus contention, thereby improving the accelerator performance by 72%, with an area increase of 18%. It is also shown that the proposed bank allocation provides an additional performance gain of up to 31% when it is applied to SqueezeNet. The proposed scatter network architectures and bank allocation can eliminate bus contention in most Cartesian product-based accelerators, regardless of the workload, without changing accelerator components other than the scatter network.</description><subject>Accelerator</subject><subject>Accelerators</subject><subject>Accumulators</subject><subject>Algorithms</subject><subject>Allocations</subject><subject>Arbitration</subject><subject>Artificial neural networks</subject><subject>Buffers</subject><subject>Cartesian coordinates</subject><subject>Computer architecture</subject><subject>Convolutional neural networks</subject><subject>convolutional neural networks (CNNs)</subject><subject>cycle-accurate simulator</subject><subject>Data compression</subject><subject>dataflow</subject><subject>Logic synthesis</subject><subject>Network architecture</subject><subject>network on a chip (NoC)</subject><subject>Network topologies</subject><subject>Optimization</subject><subject>Performance degradation</subject><subject>Performance enhancement</subject><subject>Queueing</subject><subject>Resource management</subject><subject>Scattering</subject><subject>System-on-chip</subject><subject>Workload</subject><subject>Workloads</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUctOwzAQjBBIIOgX9GKJc4ofcRwfQ8RLQuUQuGJtHBtS0rrYrhB8PaZBiL3sajQzu6vJsjnBC0KwvKib5qptFxRTumBESkzwQXZCSSlzxll5-G8-zmYhrHCqKkFcnGTPD9s4rIcviIPbIGdRqyFG49HSxA_n31Dt9esQjY47bwKCTY8uYZPgcXR6LwrIOo_aLfhgULNcolprMxoP0flwlh1ZGIOZ_fbT7On66rG5ze8fbu6a-j7XjLCYG11VvbAYC-CFsKSj1LK-5MwKLbtEkVWHu74ERrqelB3nVFhtKiJKWqRv2Gl2N_n2DlZq64c1-E_lYFB7wPkXBT4OejRKgCg0FVwSBgVg2hW9xpJbDdCR3lbJ63zy2nr3vjMhqpXb-U06X1GBCyyrsuCJxSaW9i4Eb-zfVoLVTy5qykX95KJ-c0mq-aQajDF_CllxISll37kviNA</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Kim, Sunwoo</creator><creator>Park, Sungkyung</creator><creator>Park, Chester Sungchung</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-1171-5020</orcidid><orcidid>https://orcid.org/0000-0003-2009-2814</orcidid><orcidid>https://orcid.org/0000-0002-8753-5969</orcidid></search><sort><creationdate>2022</creationdate><title>Optimization of Scatter Network Architectures and Bank Allocations for Sparse CNN Accelerators</title><author>Kim, Sunwoo ; Park, Sungkyung ; Park, Chester Sungchung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c313t-ec88d7f007a547f1b22f3d653f7c9b31398b0bd6a31bd16b5527fce8176240003</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accelerator</topic><topic>Accelerators</topic><topic>Accumulators</topic><topic>Algorithms</topic><topic>Allocations</topic><topic>Arbitration</topic><topic>Artificial neural networks</topic><topic>Buffers</topic><topic>Cartesian coordinates</topic><topic>Computer architecture</topic><topic>Convolutional neural networks</topic><topic>convolutional neural networks (CNNs)</topic><topic>cycle-accurate simulator</topic><topic>Data compression</topic><topic>dataflow</topic><topic>Logic synthesis</topic><topic>Network architecture</topic><topic>network on a chip (NoC)</topic><topic>Network topologies</topic><topic>Optimization</topic><topic>Performance degradation</topic><topic>Performance enhancement</topic><topic>Queueing</topic><topic>Resource management</topic><topic>Scattering</topic><topic>System-on-chip</topic><topic>Workload</topic><topic>Workloads</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Sunwoo</creatorcontrib><creatorcontrib>Park, Sungkyung</creatorcontrib><creatorcontrib>Park, Chester Sungchung</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Sunwoo</au><au>Park, Sungkyung</au><au>Park, Chester Sungchung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimization of Scatter Network Architectures and Bank Allocations for Sparse CNN Accelerators</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2022</date><risdate>2022</risdate><volume>10</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Sparse convolutional neural network (SCNN) accelerators eliminate unnecessary computations and memory access by exploiting zero-valued activation pixels and filter weights. However, data movement between the multiplier array and accumulator buffer tends to be a performance bottleneck. Specifically, the scatter network, which is the core block of SCNN accelerators, delivers Cartesian products to the accumulator buffer, and certain products are not immediately delivered owing to bus contention. A previous SCNN-based architecture eliminates bus contention and improves the performance significantly by making use of different dataflows. However, it relies only on weight sparsity, and its performance is highly dependent on the workload. In this paper, we propose a novel scatter network architecture for SCNN accelerators. First, we propose network topologies (such as window and split queuing), which define the connection between the FIFOs and crossbar buses in the scatter network. Second, we investigate arbitration algorithms (such as fixed priority, round-robin, and longest-queue-first), which define the priorities of the products delivered to the accumulator buffer. However, the optimization of the scatter network architecture alone may not be able to provide sufficient performance gain since it does not help to reduce bus contention itself. In this paper, we propose a cubic-constrained bank allocation for the accumulator buffer, which reduces bus contention without any increase in the hardware area. Based on the results of cycle-accurate simulation, register-transfer-level (RTL) design, and logic synthesis, this study investigates the trade-off between the performance and complexity of SCNN accelerators. In detail, it is verified that, when the optimized SCNN accelerators are applied to AlexNet, the proposed scatter network architecture can remove most of the performance degradation due to bus contention, thereby improving the accelerator performance by 72%, with an area increase of 18%. It is also shown that the proposed bank allocation provides an additional performance gain of up to 31% when it is applied to SqueezeNet. The proposed scatter network architectures and bank allocation can eliminate bus contention in most Cartesian product-based accelerators, regardless of the workload, without changing accelerator components other than the scatter network.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2022.3199010</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-1171-5020</orcidid><orcidid>https://orcid.org/0000-0003-2009-2814</orcidid><orcidid>https://orcid.org/0000-0002-8753-5969</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2022, Vol.10, p.1-1
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_2704098645
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Accelerator Accelerators Accumulators Algorithms Allocations Arbitration Artificial neural networks Buffers Cartesian coordinates Computer architecture Convolutional neural networks convolutional neural networks (CNNs) cycle-accurate simulator Data compression dataflow Logic synthesis Network architecture network on a chip (NoC) Network topologies Optimization Performance degradation Performance enhancement Queueing Resource management Scattering System-on-chip Workload Workloads
title	Optimization of Scatter Network Architectures and Bank Allocations for Sparse CNN Accelerators
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T03%3A52%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimization%20of%20Scatter%20Network%20Architectures%20and%20Bank%20Allocations%20for%20Sparse%20CNN%20Accelerators&rft.jtitle=IEEE%20access&rft.au=Kim,%20Sunwoo&rft.date=2022&rft.volume=10&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2022.3199010&rft_dat=%3Cproquest_doaj_%3E2704098645%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2704098645&rft_id=info:pmid/&rft_ieee_id=9857922&rft_doaj_id=oai_doaj_org_article_7a74c275913a4a02b4dc095fcaab1df8&rfr_iscdi=true