S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks

Convolutional neural networks (CNNs) have achieved great success in performing cognitive tasks. However, execution of CNNs requires a large amount of computing resources and generates heavy memory traffic, which imposes a severe challenge on computing system design. Through optimizing parallel execu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2021-06
Hauptverfasser:	Yang, Jianlei, Fu, Wenzhi, Cheng, Xingzhou, Ye, Xucheng, Dai, Pengcheng, Zhao, Weisheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Cognitive tasks Computation Computer architecture Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Hardware Architecture Computer Science - Learning Convolution Data transmission Design optimization Neural networks Sparsity Systems design
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Yang, Jianlei Fu, Wenzhi Cheng, Xingzhou Ye, Xucheng Dai, Pengcheng Zhao, Weisheng
description	Convolutional neural networks (CNNs) have achieved great success in performing cognitive tasks. However, execution of CNNs requires a large amount of computing resources and generates heavy memory traffic, which imposes a severe challenge on computing system design. Through optimizing parallel executions and data reuse in convolution, systolic architecture demonstrates great advantages in accelerating CNN computations. However, regular internal data transmission path in traditional systolic architecture prevents the systolic architecture from completely leveraging the benefits introduced by neural network sparsity. Deployment of fine-grained sparsity on the existing systolic architectures is greatly hindered by the incurred computational overheads. In this work, we propose S2Engine $-$ a novel systolic architecture that can fully exploit the sparsity in CNNs with maximized data reuse. S2Engine transmits compressed data internally and allows each processing element to dynamically select an aligned data from the compressed dataflow in convolution. Compared to the naive systolic array, S2Engine achieves about $3.2\times$ and about $3.0\times$ improvements on speed and energy efficiency, respectively.
doi_str_mv	10.48550/arxiv.2106.07894
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2106_07894</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2541571267</sourcerecordid><originalsourceid>FETCH-LOGICAL-a527-8e874d4d5c2674a342ef0c163d152eb7eb28d74f212fcd9103e1f1717c8972f43</originalsourceid><addsrcrecordid>eNotj8FOAjEURRsTEwnyAa5s4nqwfW1pcUcIiglBk2E_KZ1XHRyn2M6g_L0juLqbk3vvIeSGs7E0SrF7G3-qwxg4m4yZNlN5QQYgBM-MBLgio5R2jDGYaFBKDMhrDovmrWrwgc7oOhywpvkxtaGuHJ1F91616NouIvUh0nxvY0I6D80h1F1bhcbWdI1dPEX7HeJHuiaX3tYJR_85JJvHxWa-zFYvT8_z2SqzCnRm0GhZylK5_om0QgJ65vhElFwBbjVuwZRaeuDgXTnlTCD3XHPtzFSDl2JIbs-1J91iH6tPG4_Fn3Zx0u6JuzOxj-Grw9QWu9DF_nEqQEmuNO-nxS_wKloH</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2541571267</pqid></control><display><type>article</type><title>S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Yang, Jianlei ; Fu, Wenzhi ; Cheng, Xingzhou ; Ye, Xucheng ; Dai, Pengcheng ; Zhao, Weisheng</creator><creatorcontrib>Yang, Jianlei ; Fu, Wenzhi ; Cheng, Xingzhou ; Ye, Xucheng ; Dai, Pengcheng ; Zhao, Weisheng</creatorcontrib><description>Convolutional neural networks (CNNs) have achieved great success in performing cognitive tasks. However, execution of CNNs requires a large amount of computing resources and generates heavy memory traffic, which imposes a severe challenge on computing system design. Through optimizing parallel executions and data reuse in convolution, systolic architecture demonstrates great advantages in accelerating CNN computations. However, regular internal data transmission path in traditional systolic architecture prevents the systolic architecture from completely leveraging the benefits introduced by neural network sparsity. Deployment of fine-grained sparsity on the existing systolic architectures is greatly hindered by the incurred computational overheads. In this work, we propose S2Engine $-$ a novel systolic architecture that can fully exploit the sparsity in CNNs with maximized data reuse. S2Engine transmits compressed data internally and allows each processing element to dynamically select an aligned data from the compressed dataflow in convolution. Compared to the naive systolic array, S2Engine achieves about $3.2\times$ and about $3.0\times$ improvements on speed and energy efficiency, respectively.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2106.07894</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Cognitive tasks ; Computation ; Computer architecture ; Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Hardware Architecture ; Computer Science - Learning ; Convolution ; Data transmission ; Design optimization ; Neural networks ; Sparsity ; Systems design</subject><ispartof>arXiv.org, 2021-06</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27925</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2106.07894$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1109/TC.2021.3087946$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Yang, Jianlei</creatorcontrib><creatorcontrib>Fu, Wenzhi</creatorcontrib><creatorcontrib>Cheng, Xingzhou</creatorcontrib><creatorcontrib>Ye, Xucheng</creatorcontrib><creatorcontrib>Dai, Pengcheng</creatorcontrib><creatorcontrib>Zhao, Weisheng</creatorcontrib><title>S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks</title><title>arXiv.org</title><description>Convolutional neural networks (CNNs) have achieved great success in performing cognitive tasks. However, execution of CNNs requires a large amount of computing resources and generates heavy memory traffic, which imposes a severe challenge on computing system design. Through optimizing parallel executions and data reuse in convolution, systolic architecture demonstrates great advantages in accelerating CNN computations. However, regular internal data transmission path in traditional systolic architecture prevents the systolic architecture from completely leveraging the benefits introduced by neural network sparsity. Deployment of fine-grained sparsity on the existing systolic architectures is greatly hindered by the incurred computational overheads. In this work, we propose S2Engine $-$ a novel systolic architecture that can fully exploit the sparsity in CNNs with maximized data reuse. S2Engine transmits compressed data internally and allows each processing element to dynamically select an aligned data from the compressed dataflow in convolution. Compared to the naive systolic array, S2Engine achieves about $3.2\times$ and about $3.0\times$ improvements on speed and energy efficiency, respectively.</description><subject>Artificial neural networks</subject><subject>Cognitive tasks</subject><subject>Computation</subject><subject>Computer architecture</subject><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Hardware Architecture</subject><subject>Computer Science - Learning</subject><subject>Convolution</subject><subject>Data transmission</subject><subject>Design optimization</subject><subject>Neural networks</subject><subject>Sparsity</subject><subject>Systems design</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotj8FOAjEURRsTEwnyAa5s4nqwfW1pcUcIiglBk2E_KZ1XHRyn2M6g_L0juLqbk3vvIeSGs7E0SrF7G3-qwxg4m4yZNlN5QQYgBM-MBLgio5R2jDGYaFBKDMhrDovmrWrwgc7oOhywpvkxtaGuHJ1F91616NouIvUh0nxvY0I6D80h1F1bhcbWdI1dPEX7HeJHuiaX3tYJR_85JJvHxWa-zFYvT8_z2SqzCnRm0GhZylK5_om0QgJ65vhElFwBbjVuwZRaeuDgXTnlTCD3XHPtzFSDl2JIbs-1J91iH6tPG4_Fn3Zx0u6JuzOxj-Grw9QWu9DF_nEqQEmuNO-nxS_wKloH</recordid><startdate>20210615</startdate><enddate>20210615</enddate><creator>Yang, Jianlei</creator><creator>Fu, Wenzhi</creator><creator>Cheng, Xingzhou</creator><creator>Ye, Xucheng</creator><creator>Dai, Pengcheng</creator><creator>Zhao, Weisheng</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210615</creationdate><title>S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks</title><author>Yang, Jianlei ; Fu, Wenzhi ; Cheng, Xingzhou ; Ye, Xucheng ; Dai, Pengcheng ; Zhao, Weisheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a527-8e874d4d5c2674a342ef0c163d152eb7eb28d74f212fcd9103e1f1717c8972f43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial neural networks</topic><topic>Cognitive tasks</topic><topic>Computation</topic><topic>Computer architecture</topic><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Hardware Architecture</topic><topic>Computer Science - Learning</topic><topic>Convolution</topic><topic>Data transmission</topic><topic>Design optimization</topic><topic>Neural networks</topic><topic>Sparsity</topic><topic>Systems design</topic><toplevel>online_resources</toplevel><creatorcontrib>Yang, Jianlei</creatorcontrib><creatorcontrib>Fu, Wenzhi</creatorcontrib><creatorcontrib>Cheng, Xingzhou</creatorcontrib><creatorcontrib>Ye, Xucheng</creatorcontrib><creatorcontrib>Dai, Pengcheng</creatorcontrib><creatorcontrib>Zhao, Weisheng</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Jianlei</au><au>Fu, Wenzhi</au><au>Cheng, Xingzhou</au><au>Ye, Xucheng</au><au>Dai, Pengcheng</au><au>Zhao, Weisheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks</atitle><jtitle>arXiv.org</jtitle><date>2021-06-15</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>Convolutional neural networks (CNNs) have achieved great success in performing cognitive tasks. However, execution of CNNs requires a large amount of computing resources and generates heavy memory traffic, which imposes a severe challenge on computing system design. Through optimizing parallel executions and data reuse in convolution, systolic architecture demonstrates great advantages in accelerating CNN computations. However, regular internal data transmission path in traditional systolic architecture prevents the systolic architecture from completely leveraging the benefits introduced by neural network sparsity. Deployment of fine-grained sparsity on the existing systolic architectures is greatly hindered by the incurred computational overheads. In this work, we propose S2Engine $-$ a novel systolic architecture that can fully exploit the sparsity in CNNs with maximized data reuse. S2Engine transmits compressed data internally and allows each processing element to dynamically select an aligned data from the compressed dataflow in convolution. Compared to the naive systolic array, S2Engine achieves about $3.2\times$ and about $3.0\times$ improvements on speed and energy efficiency, respectively.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2106.07894</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2021-06
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2106_07894
source	arXiv.org; Free E- Journals
subjects	Artificial neural networks Cognitive tasks Computation Computer architecture Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Hardware Architecture Computer Science - Learning Convolution Data transmission Design optimization Neural networks Sparsity Systems design
title	S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T16%3A09%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=S2Engine:%20A%20Novel%20Systolic%20Architecture%20for%20Sparse%20Convolutional%20Neural%20Networks&rft.jtitle=arXiv.org&rft.au=Yang,%20Jianlei&rft.date=2021-06-15&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2106.07894&rft_dat=%3Cproquest_arxiv%3E2541571267%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2541571267&rft_id=info:pmid/&rfr_iscdi=true