SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network Accelerator

Resistive random-access-memory (ReRAM) crossbar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for vector-matrix multiplication-and-accumulations (VMMs). However, it is challenging for crossbar architecture to explo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems 2023-01, Vol.42 (1), p.204-217
Hauptverfasser:	Liu, Fangxin, Wang, Zongwu, Chen, Yongbiao, He, Zhezhi, Yang, Tao, Liang, Xiaoyao, Jiang, Li
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerator Algorithms Artificial neural networks Co-design Computer architecture Hardware Mathematical analysis Matrix algebra Measurement Microprocessors Multiplication Network latency neural network Neural networks Optimization Pipelines Quantization (signal) Random access memory resistive random-access-memory (ReRAM) Routing Sparsity Virtual machine monitors Workload Workloads
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	217
container_issue	1
container_start_page	204
container_title	IEEE transactions on computer-aided design of integrated circuits and systems
container_volume	42
creator	Liu, Fangxin Wang, Zongwu Chen, Yongbiao He, Zhezhi Yang, Tao Liang, Xiaoyao Jiang, Li
description	Resistive random-access-memory (ReRAM) crossbar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for vector-matrix multiplication-and-accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in DNNs. It is inevitably complex and costly to exploit fine-grained sparsity due to the limitation of the tightly coupled crossbar structure. As a countermeasure, we develop a novel ReRAM-based DNN accelerator, named sparse-multiplication-engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Such quantized weights can be nicely generated using the alternating direction method of multipliers (ADMM) optimization during the DNN fine-tuning, which can exactly enforce bit patterns in weights. Second, we propose a novel weight mapping mechanism to slice the bits of the weight across crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly sparse nonzeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. We further propose a workload grouping algorithm and a pipeline to achieve workload balance among crossbar-rows that concurrently execute multiply-accumulate operations to optimize the system latency. Putting all together, with the optimized model, compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to 8.7\times and 2.1\times using ResNet-50 and MobileNet-v2, respectively, and achieve average 3.1\times speed up with no or little accuracy loss on ImageNet.
doi_str_mv	10.1109/TCAD.2022.3172907
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2756561568</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9769275</ieee_id><sourcerecordid>2756561568</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-374ba82db629f4bc72432167b8457eb85cd6cbbd51bb8db0db29e414ed2bae3e3</originalsourceid><addsrcrecordid>eNo9kE1LwzAcxoMoOKcfQLwUPKfmtWm8dfUVpoN1greQtP9C57QzaZH56W3Z8PRcnjd-CF1SElNK9M0qz-5iRhiLOVVME3WEJlRzhQWV9BhNCFMpJkSRU3QWwpoQKiTTE7Qq2lmB32-j4rsH-AW86Lto1nRRsbU-NN0uqlsfLWGZveDctyE46_HMBqiiV-i93QzS_bT-I8rKEjbgbdf6c3RS202Ai4NO0dvD_Sp_wvPF43OezXHJNO8wV8LZlFUuYboWrlRMcEYT5VIhFbhUllVSOldJ6lxaOVI5pkFQARVzFjjwKbre9259O9wPnVm3vf8aJg1TMpEJlUk6uOjeVY7_PdRm65tP63eGEjPCMyM8M8IzB3hD5mqfaQDg369Voodi_gfBL2o_</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2756561568</pqid></control><display><type>article</type><title>SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network Accelerator</title><source>IEEE Electronic Library (IEL)</source><creator>Liu, Fangxin ; Wang, Zongwu ; Chen, Yongbiao ; He, Zhezhi ; Yang, Tao ; Liang, Xiaoyao ; Jiang, Li</creator><creatorcontrib>Liu, Fangxin ; Wang, Zongwu ; Chen, Yongbiao ; He, Zhezhi ; Yang, Tao ; Liang, Xiaoyao ; Jiang, Li</creatorcontrib><description><![CDATA[Resistive random-access-memory (ReRAM) crossbar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for vector-matrix multiplication-and-accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in DNNs. It is inevitably complex and costly to exploit fine-grained sparsity due to the limitation of the tightly coupled crossbar structure. As a countermeasure, we develop a novel ReRAM-based DNN accelerator, named sparse-multiplication-engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Such quantized weights can be nicely generated using the alternating direction method of multipliers (ADMM) optimization during the DNN fine-tuning, which can exactly enforce bit patterns in weights. Second, we propose a novel weight mapping mechanism to slice the bits of the weight across crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly sparse nonzeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. We further propose a workload grouping algorithm and a pipeline to achieve workload balance among crossbar-rows that concurrently execute multiply-accumulate operations to optimize the system latency. Putting all together, with the optimized model, compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to <inline-formula> <tex-math notation="LaTeX">8.7\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">2.1\times </tex-math></inline-formula> using ResNet-50 and MobileNet-v2, respectively, and achieve average <inline-formula> <tex-math notation="LaTeX">3.1\times </tex-math></inline-formula> speed up with no or little accuracy loss on ImageNet.]]></description><identifier>ISSN: 0278-0070</identifier><identifier>EISSN: 1937-4151</identifier><identifier>DOI: 10.1109/TCAD.2022.3172907</identifier><identifier>CODEN: ITCSDI</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accelerator ; Algorithms ; Artificial neural networks ; Co-design ; Computer architecture ; Hardware ; Mathematical analysis ; Matrix algebra ; Measurement ; Microprocessors ; Multiplication ; Network latency ; neural network ; Neural networks ; Optimization ; Pipelines ; Quantization (signal) ; Random access memory ; resistive random-access-memory (ReRAM) ; Routing ; Sparsity ; Virtual machine monitors ; Workload ; Workloads</subject><ispartof>IEEE transactions on computer-aided design of integrated circuits and systems, 2023-01, Vol.42 (1), p.204-217</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-374ba82db629f4bc72432167b8457eb85cd6cbbd51bb8db0db29e414ed2bae3e3</citedby><cites>FETCH-LOGICAL-c293t-374ba82db629f4bc72432167b8457eb85cd6cbbd51bb8db0db29e414ed2bae3e3</cites><orcidid>0000-0001-8588-9483 ; 0000-0002-6357-236X ; 0000-0002-7353-8798 ; 0000-0002-8769-293X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9769275$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9769275$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liu, Fangxin</creatorcontrib><creatorcontrib>Wang, Zongwu</creatorcontrib><creatorcontrib>Chen, Yongbiao</creatorcontrib><creatorcontrib>He, Zhezhi</creatorcontrib><creatorcontrib>Yang, Tao</creatorcontrib><creatorcontrib>Liang, Xiaoyao</creatorcontrib><creatorcontrib>Jiang, Li</creatorcontrib><title>SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network Accelerator</title><title>IEEE transactions on computer-aided design of integrated circuits and systems</title><addtitle>TCAD</addtitle><description><![CDATA[Resistive random-access-memory (ReRAM) crossbar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for vector-matrix multiplication-and-accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in DNNs. It is inevitably complex and costly to exploit fine-grained sparsity due to the limitation of the tightly coupled crossbar structure. As a countermeasure, we develop a novel ReRAM-based DNN accelerator, named sparse-multiplication-engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Such quantized weights can be nicely generated using the alternating direction method of multipliers (ADMM) optimization during the DNN fine-tuning, which can exactly enforce bit patterns in weights. Second, we propose a novel weight mapping mechanism to slice the bits of the weight across crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly sparse nonzeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. We further propose a workload grouping algorithm and a pipeline to achieve workload balance among crossbar-rows that concurrently execute multiply-accumulate operations to optimize the system latency. Putting all together, with the optimized model, compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to <inline-formula> <tex-math notation="LaTeX">8.7\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">2.1\times </tex-math></inline-formula> using ResNet-50 and MobileNet-v2, respectively, and achieve average <inline-formula> <tex-math notation="LaTeX">3.1\times </tex-math></inline-formula> speed up with no or little accuracy loss on ImageNet.]]></description><subject>Accelerator</subject><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Co-design</subject><subject>Computer architecture</subject><subject>Hardware</subject><subject>Mathematical analysis</subject><subject>Matrix algebra</subject><subject>Measurement</subject><subject>Microprocessors</subject><subject>Multiplication</subject><subject>Network latency</subject><subject>neural network</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Pipelines</subject><subject>Quantization (signal)</subject><subject>Random access memory</subject><subject>resistive random-access-memory (ReRAM)</subject><subject>Routing</subject><subject>Sparsity</subject><subject>Virtual machine monitors</subject><subject>Workload</subject><subject>Workloads</subject><issn>0278-0070</issn><issn>1937-4151</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LwzAcxoMoOKcfQLwUPKfmtWm8dfUVpoN1greQtP9C57QzaZH56W3Z8PRcnjd-CF1SElNK9M0qz-5iRhiLOVVME3WEJlRzhQWV9BhNCFMpJkSRU3QWwpoQKiTTE7Qq2lmB32-j4rsH-AW86Lto1nRRsbU-NN0uqlsfLWGZveDctyE46_HMBqiiV-i93QzS_bT-I8rKEjbgbdf6c3RS202Ai4NO0dvD_Sp_wvPF43OezXHJNO8wV8LZlFUuYboWrlRMcEYT5VIhFbhUllVSOldJ6lxaOVI5pkFQARVzFjjwKbre9259O9wPnVm3vf8aJg1TMpEJlUk6uOjeVY7_PdRm65tP63eGEjPCMyM8M8IzB3hD5mqfaQDg369Voodi_gfBL2o_</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Liu, Fangxin</creator><creator>Wang, Zongwu</creator><creator>Chen, Yongbiao</creator><creator>He, Zhezhi</creator><creator>Yang, Tao</creator><creator>Liang, Xiaoyao</creator><creator>Jiang, Li</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-8588-9483</orcidid><orcidid>https://orcid.org/0000-0002-6357-236X</orcidid><orcidid>https://orcid.org/0000-0002-7353-8798</orcidid><orcidid>https://orcid.org/0000-0002-8769-293X</orcidid></search><sort><creationdate>202301</creationdate><title>SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network Accelerator</title><author>Liu, Fangxin ; Wang, Zongwu ; Chen, Yongbiao ; He, Zhezhi ; Yang, Tao ; Liang, Xiaoyao ; Jiang, Li</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-374ba82db629f4bc72432167b8457eb85cd6cbbd51bb8db0db29e414ed2bae3e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accelerator</topic><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Co-design</topic><topic>Computer architecture</topic><topic>Hardware</topic><topic>Mathematical analysis</topic><topic>Matrix algebra</topic><topic>Measurement</topic><topic>Microprocessors</topic><topic>Multiplication</topic><topic>Network latency</topic><topic>neural network</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Pipelines</topic><topic>Quantization (signal)</topic><topic>Random access memory</topic><topic>resistive random-access-memory (ReRAM)</topic><topic>Routing</topic><topic>Sparsity</topic><topic>Virtual machine monitors</topic><topic>Workload</topic><topic>Workloads</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Fangxin</creatorcontrib><creatorcontrib>Wang, Zongwu</creatorcontrib><creatorcontrib>Chen, Yongbiao</creatorcontrib><creatorcontrib>He, Zhezhi</creatorcontrib><creatorcontrib>Yang, Tao</creatorcontrib><creatorcontrib>Liang, Xiaoyao</creatorcontrib><creatorcontrib>Jiang, Li</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Fangxin</au><au>Wang, Zongwu</au><au>Chen, Yongbiao</au><au>He, Zhezhi</au><au>Yang, Tao</au><au>Liang, Xiaoyao</au><au>Jiang, Li</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network Accelerator</atitle><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle><stitle>TCAD</stitle><date>2023-01</date><risdate>2023</risdate><volume>42</volume><issue>1</issue><spage>204</spage><epage>217</epage><pages>204-217</pages><issn>0278-0070</issn><eissn>1937-4151</eissn><coden>ITCSDI</coden><abstract><![CDATA[Resistive random-access-memory (ReRAM) crossbar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for vector-matrix multiplication-and-accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in DNNs. It is inevitably complex and costly to exploit fine-grained sparsity due to the limitation of the tightly coupled crossbar structure. As a countermeasure, we develop a novel ReRAM-based DNN accelerator, named sparse-multiplication-engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Such quantized weights can be nicely generated using the alternating direction method of multipliers (ADMM) optimization during the DNN fine-tuning, which can exactly enforce bit patterns in weights. Second, we propose a novel weight mapping mechanism to slice the bits of the weight across crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly sparse nonzeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. We further propose a workload grouping algorithm and a pipeline to achieve workload balance among crossbar-rows that concurrently execute multiply-accumulate operations to optimize the system latency. Putting all together, with the optimized model, compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to <inline-formula> <tex-math notation="LaTeX">8.7\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">2.1\times </tex-math></inline-formula> using ResNet-50 and MobileNet-v2, respectively, and achieve average <inline-formula> <tex-math notation="LaTeX">3.1\times </tex-math></inline-formula> speed up with no or little accuracy loss on ImageNet.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCAD.2022.3172907</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-8588-9483</orcidid><orcidid>https://orcid.org/0000-0002-6357-236X</orcidid><orcidid>https://orcid.org/0000-0002-7353-8798</orcidid><orcidid>https://orcid.org/0000-0002-8769-293X</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0278-0070
ispartof	IEEE transactions on computer-aided design of integrated circuits and systems, 2023-01, Vol.42 (1), p.204-217
issn	0278-0070 1937-4151
language	eng
recordid	cdi_proquest_journals_2756561568
source	IEEE Electronic Library (IEL)
subjects	Accelerator Algorithms Artificial neural networks Co-design Computer architecture Hardware Mathematical analysis Matrix algebra Measurement Microprocessors Multiplication Network latency neural network Neural networks Optimization Pipelines Quantization (signal) Random access memory resistive random-access-memory (ReRAM) Routing Sparsity Virtual machine monitors Workload Workloads
title	SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network Accelerator
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T12%3A26%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SoBS-X:%20Squeeze-Out%20Bit%20Sparsity%20for%20ReRAM-Crossbar-Based%20Neural%20Network%20Accelerator&rft.jtitle=IEEE%20transactions%20on%20computer-aided%20design%20of%20integrated%20circuits%20and%20systems&rft.au=Liu,%20Fangxin&rft.date=2023-01&rft.volume=42&rft.issue=1&rft.spage=204&rft.epage=217&rft.pages=204-217&rft.issn=0278-0070&rft.eissn=1937-4151&rft.coden=ITCSDI&rft_id=info:doi/10.1109/TCAD.2022.3172907&rft_dat=%3Cproquest_RIE%3E2756561568%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2756561568&rft_id=info:pmid/&rft_ieee_id=9769275&rfr_iscdi=true