Structured Dynamic Precision for Deep Neural Networks Quantization

Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on design automation of electronic systems 2023-01, Vol.28 (1), p.1-24, Article 12
Hauptverfasser:	Huang, Kai, Li, Bowen, Xiong, Dongliang, Jiang, Haitian, Jiang, Xiaowen, Yan, Xiaolang, Claesen, Luc, Liu, Dehong, Chen, Junjian, Liu, Zhili
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer systems organization Computer vision Computing methodologies Hardware Neural networks Power and energy
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	24
container_issue	1
container_start_page	1
container_title	ACM transactions on design automation of electronic systems
container_volume	28
creator	Huang, Kai Li, Bowen Xiong, Dongliang Jiang, Haitian Jiang, Xiaowen Yan, Xiaolang Claesen, Luc Liu, Dehong Chen, Junjian Liu, Zhili
description	Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy.
doi_str_mv	10.1145/3549535
format	Article
fullrecord	<record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3549535</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3549535</sourcerecordid><originalsourceid>FETCH-LOGICAL-a244t-1992f0b3ed530bb31a56e2d07c34edf74fe401a76daa881c12897b25c8fccc9b3</originalsourceid><addsrcrecordid>eNo9j0tLxDAUhYMoOI7i3lV2rqo3TdI0S53xBYMP1HW5TW-gOtMOSYqMv97KjK7OgfNx4GPsVMCFEEpfSq2slnqPTYTWJjMS7P7YoVSZGvshO4rxAwC0KfSEXb-mMLg0BGr4fNPhqnX8OZBrY9t33PeBz4nW_JGGgMsx0lcfPiN_GbBL7TemkTpmBx6XkU52OWXvtzdvs_ts8XT3MLtaZJgrlTJhbe6hltRoCXUtBeqC8gaMk4oab5QnBQJN0SCWpXAiL62pc-1K75yztZyy8-2vC32MgXy1Du0Kw6YSUP2qVzv1kTzbkuhW_9Df-AMb1lPx</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Structured Dynamic Precision for Deep Neural Networks Quantization</title><source>ACM Digital Library Complete</source><creator>Huang, Kai ; Li, Bowen ; Xiong, Dongliang ; Jiang, Haitian ; Jiang, Xiaowen ; Yan, Xiaolang ; Claesen, Luc ; Liu, Dehong ; Chen, Junjian ; Liu, Zhili</creator><creatorcontrib>Huang, Kai ; Li, Bowen ; Xiong, Dongliang ; Jiang, Haitian ; Jiang, Xiaowen ; Yan, Xiaolang ; Claesen, Luc ; Liu, Dehong ; Chen, Junjian ; Liu, Zhili</creatorcontrib><description>Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy.</description><identifier>ISSN: 1084-4309</identifier><identifier>EISSN: 1557-7309</identifier><identifier>DOI: 10.1145/3549535</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Computer systems organization ; Computer vision ; Computing methodologies ; Hardware ; Neural networks ; Power and energy</subject><ispartof>ACM transactions on design automation of electronic systems, 2023-01, Vol.28 (1), p.1-24, Article 12</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a244t-1992f0b3ed530bb31a56e2d07c34edf74fe401a76daa881c12897b25c8fccc9b3</citedby><cites>FETCH-LOGICAL-a244t-1992f0b3ed530bb31a56e2d07c34edf74fe401a76daa881c12897b25c8fccc9b3</cites><orcidid>0000-0003-0001-5619 ; 0000-0002-4882-7504 ; 0000-0001-8215-166X ; 0000-0003-1515-8989 ; 0000-0003-0405-6290 ; 0000-0001-7525-9672 ; 0000-0001-7229-179X ; 0000-0002-5409-1425 ; 0000-0003-2295-5433 ; 0000-0002-6283-2262</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3549535$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,776,780,2276,27901,27902,40172,75970</link.rule.ids></links><search><creatorcontrib>Huang, Kai</creatorcontrib><creatorcontrib>Li, Bowen</creatorcontrib><creatorcontrib>Xiong, Dongliang</creatorcontrib><creatorcontrib>Jiang, Haitian</creatorcontrib><creatorcontrib>Jiang, Xiaowen</creatorcontrib><creatorcontrib>Yan, Xiaolang</creatorcontrib><creatorcontrib>Claesen, Luc</creatorcontrib><creatorcontrib>Liu, Dehong</creatorcontrib><creatorcontrib>Chen, Junjian</creatorcontrib><creatorcontrib>Liu, Zhili</creatorcontrib><title>Structured Dynamic Precision for Deep Neural Networks Quantization</title><title>ACM transactions on design automation of electronic systems</title><addtitle>ACM TODAES</addtitle><description>Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy.</description><subject>Computer systems organization</subject><subject>Computer vision</subject><subject>Computing methodologies</subject><subject>Hardware</subject><subject>Neural networks</subject><subject>Power and energy</subject><issn>1084-4309</issn><issn>1557-7309</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9j0tLxDAUhYMoOI7i3lV2rqo3TdI0S53xBYMP1HW5TW-gOtMOSYqMv97KjK7OgfNx4GPsVMCFEEpfSq2slnqPTYTWJjMS7P7YoVSZGvshO4rxAwC0KfSEXb-mMLg0BGr4fNPhqnX8OZBrY9t33PeBz4nW_JGGgMsx0lcfPiN_GbBL7TemkTpmBx6XkU52OWXvtzdvs_ts8XT3MLtaZJgrlTJhbe6hltRoCXUtBeqC8gaMk4oab5QnBQJN0SCWpXAiL62pc-1K75yztZyy8-2vC32MgXy1Du0Kw6YSUP2qVzv1kTzbkuhW_9Df-AMb1lPx</recordid><startdate>20230120</startdate><enddate>20230120</enddate><creator>Huang, Kai</creator><creator>Li, Bowen</creator><creator>Xiong, Dongliang</creator><creator>Jiang, Haitian</creator><creator>Jiang, Xiaowen</creator><creator>Yan, Xiaolang</creator><creator>Claesen, Luc</creator><creator>Liu, Dehong</creator><creator>Chen, Junjian</creator><creator>Liu, Zhili</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-0001-5619</orcidid><orcidid>https://orcid.org/0000-0002-4882-7504</orcidid><orcidid>https://orcid.org/0000-0001-8215-166X</orcidid><orcidid>https://orcid.org/0000-0003-1515-8989</orcidid><orcidid>https://orcid.org/0000-0003-0405-6290</orcidid><orcidid>https://orcid.org/0000-0001-7525-9672</orcidid><orcidid>https://orcid.org/0000-0001-7229-179X</orcidid><orcidid>https://orcid.org/0000-0002-5409-1425</orcidid><orcidid>https://orcid.org/0000-0003-2295-5433</orcidid><orcidid>https://orcid.org/0000-0002-6283-2262</orcidid></search><sort><creationdate>20230120</creationdate><title>Structured Dynamic Precision for Deep Neural Networks Quantization</title><author>Huang, Kai ; Li, Bowen ; Xiong, Dongliang ; Jiang, Haitian ; Jiang, Xiaowen ; Yan, Xiaolang ; Claesen, Luc ; Liu, Dehong ; Chen, Junjian ; Liu, Zhili</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a244t-1992f0b3ed530bb31a56e2d07c34edf74fe401a76daa881c12897b25c8fccc9b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer systems organization</topic><topic>Computer vision</topic><topic>Computing methodologies</topic><topic>Hardware</topic><topic>Neural networks</topic><topic>Power and energy</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Kai</creatorcontrib><creatorcontrib>Li, Bowen</creatorcontrib><creatorcontrib>Xiong, Dongliang</creatorcontrib><creatorcontrib>Jiang, Haitian</creatorcontrib><creatorcontrib>Jiang, Xiaowen</creatorcontrib><creatorcontrib>Yan, Xiaolang</creatorcontrib><creatorcontrib>Claesen, Luc</creatorcontrib><creatorcontrib>Liu, Dehong</creatorcontrib><creatorcontrib>Chen, Junjian</creatorcontrib><creatorcontrib>Liu, Zhili</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on design automation of electronic systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Kai</au><au>Li, Bowen</au><au>Xiong, Dongliang</au><au>Jiang, Haitian</au><au>Jiang, Xiaowen</au><au>Yan, Xiaolang</au><au>Claesen, Luc</au><au>Liu, Dehong</au><au>Chen, Junjian</au><au>Liu, Zhili</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Structured Dynamic Precision for Deep Neural Networks Quantization</atitle><jtitle>ACM transactions on design automation of electronic systems</jtitle><stitle>ACM TODAES</stitle><date>2023-01-20</date><risdate>2023</risdate><volume>28</volume><issue>1</issue><spage>1</spage><epage>24</epage><pages>1-24</pages><artnum>12</artnum><issn>1084-4309</issn><eissn>1557-7309</eissn><abstract>Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3549535</doi><tpages>24</tpages><orcidid>https://orcid.org/0000-0003-0001-5619</orcidid><orcidid>https://orcid.org/0000-0002-4882-7504</orcidid><orcidid>https://orcid.org/0000-0001-8215-166X</orcidid><orcidid>https://orcid.org/0000-0003-1515-8989</orcidid><orcidid>https://orcid.org/0000-0003-0405-6290</orcidid><orcidid>https://orcid.org/0000-0001-7525-9672</orcidid><orcidid>https://orcid.org/0000-0001-7229-179X</orcidid><orcidid>https://orcid.org/0000-0002-5409-1425</orcidid><orcidid>https://orcid.org/0000-0003-2295-5433</orcidid><orcidid>https://orcid.org/0000-0002-6283-2262</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1084-4309
ispartof	ACM transactions on design automation of electronic systems, 2023-01, Vol.28 (1), p.1-24, Article 12
issn	1084-4309 1557-7309
language	eng
recordid	cdi_crossref_primary_10_1145_3549535
source	ACM Digital Library Complete
subjects	Computer systems organization Computer vision Computing methodologies Hardware Neural networks Power and energy
title	Structured Dynamic Precision for Deep Neural Networks Quantization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T00%3A23%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Structured%20Dynamic%20Precision%20for%20Deep%20Neural%20Networks%20Quantization&rft.jtitle=ACM%20transactions%20on%20design%20automation%20of%20electronic%20systems&rft.au=Huang,%20Kai&rft.date=2023-01-20&rft.volume=28&rft.issue=1&rft.spage=1&rft.epage=24&rft.pages=1-24&rft.artnum=12&rft.issn=1084-4309&rft.eissn=1557-7309&rft_id=info:doi/10.1145/3549535&rft_dat=%3Cacm_cross%3E3549535%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true