Structured Dynamic Precision for Deep Neural Networks Quantization

Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on design automation of electronic systems 2023-01, Vol.28 (1), p.1-24, Article 12
Hauptverfasser: Huang, Kai, Li, Bowen, Xiong, Dongliang, Jiang, Haitian, Jiang, Xiaowen, Yan, Xiaolang, Claesen, Luc, Liu, Dehong, Chen, Junjian, Liu, Zhili
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 24
container_issue 1
container_start_page 1
container_title ACM transactions on design automation of electronic systems
container_volume 28
creator Huang, Kai
Li, Bowen
Xiong, Dongliang
Jiang, Haitian
Jiang, Xiaowen
Yan, Xiaolang
Claesen, Luc
Liu, Dehong
Chen, Junjian
Liu, Zhili
description Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy.
doi_str_mv 10.1145/3549535
format Article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3549535</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3549535</sourcerecordid><originalsourceid>FETCH-LOGICAL-a244t-1992f0b3ed530bb31a56e2d07c34edf74fe401a76daa881c12897b25c8fccc9b3</originalsourceid><addsrcrecordid>eNo9j0tLxDAUhYMoOI7i3lV2rqo3TdI0S53xBYMP1HW5TW-gOtMOSYqMv97KjK7OgfNx4GPsVMCFEEpfSq2slnqPTYTWJjMS7P7YoVSZGvshO4rxAwC0KfSEXb-mMLg0BGr4fNPhqnX8OZBrY9t33PeBz4nW_JGGgMsx0lcfPiN_GbBL7TemkTpmBx6XkU52OWXvtzdvs_ts8XT3MLtaZJgrlTJhbe6hltRoCXUtBeqC8gaMk4oab5QnBQJN0SCWpXAiL62pc-1K75yztZyy8-2vC32MgXy1Du0Kw6YSUP2qVzv1kTzbkuhW_9Df-AMb1lPx</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Structured Dynamic Precision for Deep Neural Networks Quantization</title><source>ACM Digital Library Complete</source><creator>Huang, Kai ; Li, Bowen ; Xiong, Dongliang ; Jiang, Haitian ; Jiang, Xiaowen ; Yan, Xiaolang ; Claesen, Luc ; Liu, Dehong ; Chen, Junjian ; Liu, Zhili</creator><creatorcontrib>Huang, Kai ; Li, Bowen ; Xiong, Dongliang ; Jiang, Haitian ; Jiang, Xiaowen ; Yan, Xiaolang ; Claesen, Luc ; Liu, Dehong ; Chen, Junjian ; Liu, Zhili</creatorcontrib><description>Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy.</description><identifier>ISSN: 1084-4309</identifier><identifier>EISSN: 1557-7309</identifier><identifier>DOI: 10.1145/3549535</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Computer systems organization ; Computer vision ; Computing methodologies ; Hardware ; Neural networks ; Power and energy</subject><ispartof>ACM transactions on design automation of electronic systems, 2023-01, Vol.28 (1), p.1-24, Article 12</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a244t-1992f0b3ed530bb31a56e2d07c34edf74fe401a76daa881c12897b25c8fccc9b3</citedby><cites>FETCH-LOGICAL-a244t-1992f0b3ed530bb31a56e2d07c34edf74fe401a76daa881c12897b25c8fccc9b3</cites><orcidid>0000-0003-0001-5619 ; 0000-0002-4882-7504 ; 0000-0001-8215-166X ; 0000-0003-1515-8989 ; 0000-0003-0405-6290 ; 0000-0001-7525-9672 ; 0000-0001-7229-179X ; 0000-0002-5409-1425 ; 0000-0003-2295-5433 ; 0000-0002-6283-2262</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3549535$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,776,780,2276,27901,27902,40172,75970</link.rule.ids></links><search><creatorcontrib>Huang, Kai</creatorcontrib><creatorcontrib>Li, Bowen</creatorcontrib><creatorcontrib>Xiong, Dongliang</creatorcontrib><creatorcontrib>Jiang, Haitian</creatorcontrib><creatorcontrib>Jiang, Xiaowen</creatorcontrib><creatorcontrib>Yan, Xiaolang</creatorcontrib><creatorcontrib>Claesen, Luc</creatorcontrib><creatorcontrib>Liu, Dehong</creatorcontrib><creatorcontrib>Chen, Junjian</creatorcontrib><creatorcontrib>Liu, Zhili</creatorcontrib><title>Structured Dynamic Precision for Deep Neural Networks Quantization</title><title>ACM transactions on design automation of electronic systems</title><addtitle>ACM TODAES</addtitle><description>Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy.</description><subject>Computer systems organization</subject><subject>Computer vision</subject><subject>Computing methodologies</subject><subject>Hardware</subject><subject>Neural networks</subject><subject>Power and energy</subject><issn>1084-4309</issn><issn>1557-7309</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9j0tLxDAUhYMoOI7i3lV2rqo3TdI0S53xBYMP1HW5TW-gOtMOSYqMv97KjK7OgfNx4GPsVMCFEEpfSq2slnqPTYTWJjMS7P7YoVSZGvshO4rxAwC0KfSEXb-mMLg0BGr4fNPhqnX8OZBrY9t33PeBz4nW_JGGgMsx0lcfPiN_GbBL7TemkTpmBx6XkU52OWXvtzdvs_ts8XT3MLtaZJgrlTJhbe6hltRoCXUtBeqC8gaMk4oab5QnBQJN0SCWpXAiL62pc-1K75yztZyy8-2vC32MgXy1Du0Kw6YSUP2qVzv1kTzbkuhW_9Df-AMb1lPx</recordid><startdate>20230120</startdate><enddate>20230120</enddate><creator>Huang, Kai</creator><creator>Li, Bowen</creator><creator>Xiong, Dongliang</creator><creator>Jiang, Haitian</creator><creator>Jiang, Xiaowen</creator><creator>Yan, Xiaolang</creator><creator>Claesen, Luc</creator><creator>Liu, Dehong</creator><creator>Chen, Junjian</creator><creator>Liu, Zhili</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-0001-5619</orcidid><orcidid>https://orcid.org/0000-0002-4882-7504</orcidid><orcidid>https://orcid.org/0000-0001-8215-166X</orcidid><orcidid>https://orcid.org/0000-0003-1515-8989</orcidid><orcidid>https://orcid.org/0000-0003-0405-6290</orcidid><orcidid>https://orcid.org/0000-0001-7525-9672</orcidid><orcidid>https://orcid.org/0000-0001-7229-179X</orcidid><orcidid>https://orcid.org/0000-0002-5409-1425</orcidid><orcidid>https://orcid.org/0000-0003-2295-5433</orcidid><orcidid>https://orcid.org/0000-0002-6283-2262</orcidid></search><sort><creationdate>20230120</creationdate><title>Structured Dynamic Precision for Deep Neural Networks Quantization</title><author>Huang, Kai ; Li, Bowen ; Xiong, Dongliang ; Jiang, Haitian ; Jiang, Xiaowen ; Yan, Xiaolang ; Claesen, Luc ; Liu, Dehong ; Chen, Junjian ; Liu, Zhili</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a244t-1992f0b3ed530bb31a56e2d07c34edf74fe401a76daa881c12897b25c8fccc9b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer systems organization</topic><topic>Computer vision</topic><topic>Computing methodologies</topic><topic>Hardware</topic><topic>Neural networks</topic><topic>Power and energy</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Kai</creatorcontrib><creatorcontrib>Li, Bowen</creatorcontrib><creatorcontrib>Xiong, Dongliang</creatorcontrib><creatorcontrib>Jiang, Haitian</creatorcontrib><creatorcontrib>Jiang, Xiaowen</creatorcontrib><creatorcontrib>Yan, Xiaolang</creatorcontrib><creatorcontrib>Claesen, Luc</creatorcontrib><creatorcontrib>Liu, Dehong</creatorcontrib><creatorcontrib>Chen, Junjian</creatorcontrib><creatorcontrib>Liu, Zhili</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on design automation of electronic systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Kai</au><au>Li, Bowen</au><au>Xiong, Dongliang</au><au>Jiang, Haitian</au><au>Jiang, Xiaowen</au><au>Yan, Xiaolang</au><au>Claesen, Luc</au><au>Liu, Dehong</au><au>Chen, Junjian</au><au>Liu, Zhili</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Structured Dynamic Precision for Deep Neural Networks Quantization</atitle><jtitle>ACM transactions on design automation of electronic systems</jtitle><stitle>ACM TODAES</stitle><date>2023-01-20</date><risdate>2023</risdate><volume>28</volume><issue>1</issue><spage>1</spage><epage>24</epage><pages>1-24</pages><artnum>12</artnum><issn>1084-4309</issn><eissn>1557-7309</eissn><abstract>Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3549535</doi><tpages>24</tpages><orcidid>https://orcid.org/0000-0003-0001-5619</orcidid><orcidid>https://orcid.org/0000-0002-4882-7504</orcidid><orcidid>https://orcid.org/0000-0001-8215-166X</orcidid><orcidid>https://orcid.org/0000-0003-1515-8989</orcidid><orcidid>https://orcid.org/0000-0003-0405-6290</orcidid><orcidid>https://orcid.org/0000-0001-7525-9672</orcidid><orcidid>https://orcid.org/0000-0001-7229-179X</orcidid><orcidid>https://orcid.org/0000-0002-5409-1425</orcidid><orcidid>https://orcid.org/0000-0003-2295-5433</orcidid><orcidid>https://orcid.org/0000-0002-6283-2262</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1084-4309
ispartof ACM transactions on design automation of electronic systems, 2023-01, Vol.28 (1), p.1-24, Article 12
issn 1084-4309
1557-7309
language eng
recordid cdi_crossref_primary_10_1145_3549535
source ACM Digital Library Complete
subjects Computer systems organization
Computer vision
Computing methodologies
Hardware
Neural networks
Power and energy
title Structured Dynamic Precision for Deep Neural Networks Quantization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T00%3A23%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Structured%20Dynamic%20Precision%20for%20Deep%20Neural%20Networks%20Quantization&rft.jtitle=ACM%20transactions%20on%20design%20automation%20of%20electronic%20systems&rft.au=Huang,%20Kai&rft.date=2023-01-20&rft.volume=28&rft.issue=1&rft.spage=1&rft.epage=24&rft.pages=1-24&rft.artnum=12&rft.issn=1084-4309&rft.eissn=1557-7309&rft_id=info:doi/10.1145/3549535&rft_dat=%3Cacm_cross%3E3549535%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true