Structured Term Pruning for Computational Efficient Neural Networks Inference

The state-of-the-art convolutional neural network accelerators are showing a growing interest in exploiting the bit-level sparsity and eliminating the ineffectual computations of zero bits. However, the excessive redundancy and the irregular distribution of nonzero bits limit the real speedup in the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computer-aided design of integrated circuits and systems 2023-01, Vol.42 (1), p.190-203
Hauptverfasser: Huang, Kai, Li, Bowen, Chen, Siang, Claesen, Luc, Xi, Wei, Chen, Junjian, Jiang, Xiaowen, Liu, Zhili, Xiong, Dongliang, Yan, Xiaolang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 203
container_issue 1
container_start_page 190
container_title IEEE transactions on computer-aided design of integrated circuits and systems
container_volume 42
creator Huang, Kai
Li, Bowen
Chen, Siang
Claesen, Luc
Xi, Wei
Chen, Junjian
Jiang, Xiaowen
Liu, Zhili
Xiong, Dongliang
Yan, Xiaolang
description The state-of-the-art convolutional neural network accelerators are showing a growing interest in exploiting the bit-level sparsity and eliminating the ineffectual computations of zero bits. However, the excessive redundancy and the irregular distribution of nonzero bits limit the real speedup in the accelerators. To address this, we propose an algorithm-architecture codesign, named structured term pruning (STP), to boost the computation efficiency of neural networks inference. Specifically, we enhance the bit sparsity by guiding the weights toward the value with fewer power-of-two terms. Then, we structure the terms with layer-wise group budgets. Retraining is adopted to recover the accuracy drop. We also design the hardware of the group processing element and the fast signed-digital encoder for efficient implementation of STP networks. The system design of STP is realized with some easy alterations on an input stationary systolic array design. Extensive evaluation results demonstrate that STP can reduce significant inference computation costs, and achieve 2.35\times computational energy saving for the ResNet18 network on the ImageNet dataset.
doi_str_mv 10.1109/TCAD.2022.3168506
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2756560650</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9759473</ieee_id><sourcerecordid>2756560650</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-a6c1b8260887786926fd4c1c3ef93eb3d349a5a5731307e90b8267ad572037c43</originalsourceid><addsrcrecordid>eNo9kFFLwzAUhYMoOKc_QHwp-Nx5kzRJ8zjm1MGcgvM5ZOmNdG7tTFLEf2_LxKcLh-8cLh8h1xQmlIK-W8-m9xMGjE04laUAeUJGVHOVF1TQUzICpsocQME5uYhxC0ALwfSIPL-l0LnUBayyNYZ99hq6pm4-Mt-GbNbuD12yqW4bu8vm3teuxiZlK-xCH6wwfbfhM2aLxmPAxuElOfN2F_Hq747J-8N8PXvKly-Pi9l0mTumecqtdHRTMgllqVQpNZO-Khx1HL3muOEVL7QVVihOOSjUMMDKVkIx4MoVfExuj7uH0H51GJPZtl3on4yGKSGFBCmgp-iRcqGNMaA3h1DvbfgxFMxgzQzWzGDN_FnrOzfHTo2I_7xWQheK8195bWe4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2756560650</pqid></control><display><type>article</type><title>Structured Term Pruning for Computational Efficient Neural Networks Inference</title><source>IEEE Electronic Library (IEL)</source><creator>Huang, Kai ; Li, Bowen ; Chen, Siang ; Claesen, Luc ; Xi, Wei ; Chen, Junjian ; Jiang, Xiaowen ; Liu, Zhili ; Xiong, Dongliang ; Yan, Xiaolang</creator><creatorcontrib>Huang, Kai ; Li, Bowen ; Chen, Siang ; Claesen, Luc ; Xi, Wei ; Chen, Junjian ; Jiang, Xiaowen ; Liu, Zhili ; Xiong, Dongliang ; Yan, Xiaolang</creatorcontrib><description>The state-of-the-art convolutional neural network accelerators are showing a growing interest in exploiting the bit-level sparsity and eliminating the ineffectual computations of zero bits. However, the excessive redundancy and the irregular distribution of nonzero bits limit the real speedup in the accelerators. To address this, we propose an algorithm-architecture codesign, named structured term pruning (STP), to boost the computation efficiency of neural networks inference. Specifically, we enhance the bit sparsity by guiding the weights toward the value with fewer power-of-two terms. Then, we structure the terms with layer-wise group budgets. Retraining is adopted to recover the accuracy drop. We also design the hardware of the group processing element and the fast signed-digital encoder for efficient implementation of STP networks. The system design of STP is realized with some easy alterations on an input stationary systolic array design. Extensive evaluation results demonstrate that STP can reduce significant inference computation costs, and achieve &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;2.35\times &lt;/tex-math&gt;&lt;/inline-formula&gt; computational energy saving for the ResNet18 network on the ImageNet dataset.</description><identifier>ISSN: 0278-0070</identifier><identifier>EISSN: 1937-4151</identifier><identifier>DOI: 10.1109/TCAD.2022.3168506</identifier><identifier>CODEN: ITCSDI</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accelerators ; Algorithm-architecture codesign ; Algorithms ; Artificial neural networks ; Biological neural networks ; Co-design ; Coders ; compression and acceleration ; Computational efficiency ; Encoding ; Hardware ; Inference ; Neural networks ; quantization ; Quantization (signal) ; Redundancy ; Sparsity ; Systems design ; systolic array (SA) ; Systolic arrays ; Training</subject><ispartof>IEEE transactions on computer-aided design of integrated circuits and systems, 2023-01, Vol.42 (1), p.190-203</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-a6c1b8260887786926fd4c1c3ef93eb3d349a5a5731307e90b8267ad572037c43</citedby><cites>FETCH-LOGICAL-c293t-a6c1b8260887786926fd4c1c3ef93eb3d349a5a5731307e90b8267ad572037c43</cites><orcidid>0000-0003-0405-6290 ; 0000-0002-6283-2262 ; 0000-0002-9061-7754 ; 0000-0003-2846-781X ; 0000-0001-7525-9672</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9759473$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9759473$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Huang, Kai</creatorcontrib><creatorcontrib>Li, Bowen</creatorcontrib><creatorcontrib>Chen, Siang</creatorcontrib><creatorcontrib>Claesen, Luc</creatorcontrib><creatorcontrib>Xi, Wei</creatorcontrib><creatorcontrib>Chen, Junjian</creatorcontrib><creatorcontrib>Jiang, Xiaowen</creatorcontrib><creatorcontrib>Liu, Zhili</creatorcontrib><creatorcontrib>Xiong, Dongliang</creatorcontrib><creatorcontrib>Yan, Xiaolang</creatorcontrib><title>Structured Term Pruning for Computational Efficient Neural Networks Inference</title><title>IEEE transactions on computer-aided design of integrated circuits and systems</title><addtitle>TCAD</addtitle><description>The state-of-the-art convolutional neural network accelerators are showing a growing interest in exploiting the bit-level sparsity and eliminating the ineffectual computations of zero bits. However, the excessive redundancy and the irregular distribution of nonzero bits limit the real speedup in the accelerators. To address this, we propose an algorithm-architecture codesign, named structured term pruning (STP), to boost the computation efficiency of neural networks inference. Specifically, we enhance the bit sparsity by guiding the weights toward the value with fewer power-of-two terms. Then, we structure the terms with layer-wise group budgets. Retraining is adopted to recover the accuracy drop. We also design the hardware of the group processing element and the fast signed-digital encoder for efficient implementation of STP networks. The system design of STP is realized with some easy alterations on an input stationary systolic array design. Extensive evaluation results demonstrate that STP can reduce significant inference computation costs, and achieve &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;2.35\times &lt;/tex-math&gt;&lt;/inline-formula&gt; computational energy saving for the ResNet18 network on the ImageNet dataset.</description><subject>Accelerators</subject><subject>Algorithm-architecture codesign</subject><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Biological neural networks</subject><subject>Co-design</subject><subject>Coders</subject><subject>compression and acceleration</subject><subject>Computational efficiency</subject><subject>Encoding</subject><subject>Hardware</subject><subject>Inference</subject><subject>Neural networks</subject><subject>quantization</subject><subject>Quantization (signal)</subject><subject>Redundancy</subject><subject>Sparsity</subject><subject>Systems design</subject><subject>systolic array (SA)</subject><subject>Systolic arrays</subject><subject>Training</subject><issn>0278-0070</issn><issn>1937-4151</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kFFLwzAUhYMoOKc_QHwp-Nx5kzRJ8zjm1MGcgvM5ZOmNdG7tTFLEf2_LxKcLh-8cLh8h1xQmlIK-W8-m9xMGjE04laUAeUJGVHOVF1TQUzICpsocQME5uYhxC0ALwfSIPL-l0LnUBayyNYZ99hq6pm4-Mt-GbNbuD12yqW4bu8vm3teuxiZlK-xCH6wwfbfhM2aLxmPAxuElOfN2F_Hq747J-8N8PXvKly-Pi9l0mTumecqtdHRTMgllqVQpNZO-Khx1HL3muOEVL7QVVihOOSjUMMDKVkIx4MoVfExuj7uH0H51GJPZtl3on4yGKSGFBCmgp-iRcqGNMaA3h1DvbfgxFMxgzQzWzGDN_FnrOzfHTo2I_7xWQheK8195bWe4</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Huang, Kai</creator><creator>Li, Bowen</creator><creator>Chen, Siang</creator><creator>Claesen, Luc</creator><creator>Xi, Wei</creator><creator>Chen, Junjian</creator><creator>Jiang, Xiaowen</creator><creator>Liu, Zhili</creator><creator>Xiong, Dongliang</creator><creator>Yan, Xiaolang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0405-6290</orcidid><orcidid>https://orcid.org/0000-0002-6283-2262</orcidid><orcidid>https://orcid.org/0000-0002-9061-7754</orcidid><orcidid>https://orcid.org/0000-0003-2846-781X</orcidid><orcidid>https://orcid.org/0000-0001-7525-9672</orcidid></search><sort><creationdate>202301</creationdate><title>Structured Term Pruning for Computational Efficient Neural Networks Inference</title><author>Huang, Kai ; Li, Bowen ; Chen, Siang ; Claesen, Luc ; Xi, Wei ; Chen, Junjian ; Jiang, Xiaowen ; Liu, Zhili ; Xiong, Dongliang ; Yan, Xiaolang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-a6c1b8260887786926fd4c1c3ef93eb3d349a5a5731307e90b8267ad572037c43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accelerators</topic><topic>Algorithm-architecture codesign</topic><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Biological neural networks</topic><topic>Co-design</topic><topic>Coders</topic><topic>compression and acceleration</topic><topic>Computational efficiency</topic><topic>Encoding</topic><topic>Hardware</topic><topic>Inference</topic><topic>Neural networks</topic><topic>quantization</topic><topic>Quantization (signal)</topic><topic>Redundancy</topic><topic>Sparsity</topic><topic>Systems design</topic><topic>systolic array (SA)</topic><topic>Systolic arrays</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Kai</creatorcontrib><creatorcontrib>Li, Bowen</creatorcontrib><creatorcontrib>Chen, Siang</creatorcontrib><creatorcontrib>Claesen, Luc</creatorcontrib><creatorcontrib>Xi, Wei</creatorcontrib><creatorcontrib>Chen, Junjian</creatorcontrib><creatorcontrib>Jiang, Xiaowen</creatorcontrib><creatorcontrib>Liu, Zhili</creatorcontrib><creatorcontrib>Xiong, Dongliang</creatorcontrib><creatorcontrib>Yan, Xiaolang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Huang, Kai</au><au>Li, Bowen</au><au>Chen, Siang</au><au>Claesen, Luc</au><au>Xi, Wei</au><au>Chen, Junjian</au><au>Jiang, Xiaowen</au><au>Liu, Zhili</au><au>Xiong, Dongliang</au><au>Yan, Xiaolang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Structured Term Pruning for Computational Efficient Neural Networks Inference</atitle><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle><stitle>TCAD</stitle><date>2023-01</date><risdate>2023</risdate><volume>42</volume><issue>1</issue><spage>190</spage><epage>203</epage><pages>190-203</pages><issn>0278-0070</issn><eissn>1937-4151</eissn><coden>ITCSDI</coden><abstract>The state-of-the-art convolutional neural network accelerators are showing a growing interest in exploiting the bit-level sparsity and eliminating the ineffectual computations of zero bits. However, the excessive redundancy and the irregular distribution of nonzero bits limit the real speedup in the accelerators. To address this, we propose an algorithm-architecture codesign, named structured term pruning (STP), to boost the computation efficiency of neural networks inference. Specifically, we enhance the bit sparsity by guiding the weights toward the value with fewer power-of-two terms. Then, we structure the terms with layer-wise group budgets. Retraining is adopted to recover the accuracy drop. We also design the hardware of the group processing element and the fast signed-digital encoder for efficient implementation of STP networks. The system design of STP is realized with some easy alterations on an input stationary systolic array design. Extensive evaluation results demonstrate that STP can reduce significant inference computation costs, and achieve &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;2.35\times &lt;/tex-math&gt;&lt;/inline-formula&gt; computational energy saving for the ResNet18 network on the ImageNet dataset.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCAD.2022.3168506</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-0405-6290</orcidid><orcidid>https://orcid.org/0000-0002-6283-2262</orcidid><orcidid>https://orcid.org/0000-0002-9061-7754</orcidid><orcidid>https://orcid.org/0000-0003-2846-781X</orcidid><orcidid>https://orcid.org/0000-0001-7525-9672</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0278-0070
ispartof IEEE transactions on computer-aided design of integrated circuits and systems, 2023-01, Vol.42 (1), p.190-203
issn 0278-0070
1937-4151
language eng
recordid cdi_proquest_journals_2756560650
source IEEE Electronic Library (IEL)
subjects Accelerators
Algorithm-architecture codesign
Algorithms
Artificial neural networks
Biological neural networks
Co-design
Coders
compression and acceleration
Computational efficiency
Encoding
Hardware
Inference
Neural networks
quantization
Quantization (signal)
Redundancy
Sparsity
Systems design
systolic array (SA)
Systolic arrays
Training
title Structured Term Pruning for Computational Efficient Neural Networks Inference
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T20%3A38%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Structured%20Term%20Pruning%20for%20Computational%20Efficient%20Neural%20Networks%20Inference&rft.jtitle=IEEE%20transactions%20on%20computer-aided%20design%20of%20integrated%20circuits%20and%20systems&rft.au=Huang,%20Kai&rft.date=2023-01&rft.volume=42&rft.issue=1&rft.spage=190&rft.epage=203&rft.pages=190-203&rft.issn=0278-0070&rft.eissn=1937-4151&rft.coden=ITCSDI&rft_id=info:doi/10.1109/TCAD.2022.3168506&rft_dat=%3Cproquest_RIE%3E2756560650%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2756560650&rft_id=info:pmid/&rft_ieee_id=9759473&rfr_iscdi=true