Structured Dynamic Precision for Deep Neural Networks Quantization
Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy...
Gespeichert in:
Veröffentlicht in: | ACM transactions on design automation of electronic systems 2023-01, Vol.28 (1), p.1-24, Article 12 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 24 |
---|---|
container_issue | 1 |
container_start_page | 1 |
container_title | ACM transactions on design automation of electronic systems |
container_volume | 28 |
creator | Huang, Kai Li, Bowen Xiong, Dongliang Jiang, Haitian Jiang, Xiaowen Yan, Xiaolang Claesen, Luc Liu, Dehong Chen, Junjian Liu, Zhili |
description | Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy. |
doi_str_mv | 10.1145/3549535 |
format | Article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3549535</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3549535</sourcerecordid><originalsourceid>FETCH-LOGICAL-a244t-1992f0b3ed530bb31a56e2d07c34edf74fe401a76daa881c12897b25c8fccc9b3</originalsourceid><addsrcrecordid>eNo9j0tLxDAUhYMoOI7i3lV2rqo3TdI0S53xBYMP1HW5TW-gOtMOSYqMv97KjK7OgfNx4GPsVMCFEEpfSq2slnqPTYTWJjMS7P7YoVSZGvshO4rxAwC0KfSEXb-mMLg0BGr4fNPhqnX8OZBrY9t33PeBz4nW_JGGgMsx0lcfPiN_GbBL7TemkTpmBx6XkU52OWXvtzdvs_ts8XT3MLtaZJgrlTJhbe6hltRoCXUtBeqC8gaMk4oab5QnBQJN0SCWpXAiL62pc-1K75yztZyy8-2vC32MgXy1Du0Kw6YSUP2qVzv1kTzbkuhW_9Df-AMb1lPx</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Structured Dynamic Precision for Deep Neural Networks Quantization</title><source>ACM Digital Library Complete</source><creator>Huang, Kai ; Li, Bowen ; Xiong, Dongliang ; Jiang, Haitian ; Jiang, Xiaowen ; Yan, Xiaolang ; Claesen, Luc ; Liu, Dehong ; Chen, Junjian ; Liu, Zhili</creator><creatorcontrib>Huang, Kai ; Li, Bowen ; Xiong, Dongliang ; Jiang, Haitian ; Jiang, Xiaowen ; Yan, Xiaolang ; Claesen, Luc ; Liu, Dehong ; Chen, Junjian ; Liu, Zhili</creatorcontrib><description>Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy.</description><identifier>ISSN: 1084-4309</identifier><identifier>EISSN: 1557-7309</identifier><identifier>DOI: 10.1145/3549535</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Computer systems organization ; Computer vision ; Computing methodologies ; Hardware ; Neural networks ; Power and energy</subject><ispartof>ACM transactions on design automation of electronic systems, 2023-01, Vol.28 (1), p.1-24, Article 12</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a244t-1992f0b3ed530bb31a56e2d07c34edf74fe401a76daa881c12897b25c8fccc9b3</citedby><cites>FETCH-LOGICAL-a244t-1992f0b3ed530bb31a56e2d07c34edf74fe401a76daa881c12897b25c8fccc9b3</cites><orcidid>0000-0003-0001-5619 ; 0000-0002-4882-7504 ; 0000-0001-8215-166X ; 0000-0003-1515-8989 ; 0000-0003-0405-6290 ; 0000-0001-7525-9672 ; 0000-0001-7229-179X ; 0000-0002-5409-1425 ; 0000-0003-2295-5433 ; 0000-0002-6283-2262</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3549535$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,776,780,2276,27901,27902,40172,75970</link.rule.ids></links><search><creatorcontrib>Huang, Kai</creatorcontrib><creatorcontrib>Li, Bowen</creatorcontrib><creatorcontrib>Xiong, Dongliang</creatorcontrib><creatorcontrib>Jiang, Haitian</creatorcontrib><creatorcontrib>Jiang, Xiaowen</creatorcontrib><creatorcontrib>Yan, Xiaolang</creatorcontrib><creatorcontrib>Claesen, Luc</creatorcontrib><creatorcontrib>Liu, Dehong</creatorcontrib><creatorcontrib>Chen, Junjian</creatorcontrib><creatorcontrib>Liu, Zhili</creatorcontrib><title>Structured Dynamic Precision for Deep Neural Networks Quantization</title><title>ACM transactions on design automation of electronic systems</title><addtitle>ACM TODAES</addtitle><description>Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy.</description><subject>Computer systems organization</subject><subject>Computer vision</subject><subject>Computing methodologies</subject><subject>Hardware</subject><subject>Neural networks</subject><subject>Power and energy</subject><issn>1084-4309</issn><issn>1557-7309</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9j0tLxDAUhYMoOI7i3lV2rqo3TdI0S53xBYMP1HW5TW-gOtMOSYqMv97KjK7OgfNx4GPsVMCFEEpfSq2slnqPTYTWJjMS7P7YoVSZGvshO4rxAwC0KfSEXb-mMLg0BGr4fNPhqnX8OZBrY9t33PeBz4nW_JGGgMsx0lcfPiN_GbBL7TemkTpmBx6XkU52OWXvtzdvs_ts8XT3MLtaZJgrlTJhbe6hltRoCXUtBeqC8gaMk4oab5QnBQJN0SCWpXAiL62pc-1K75yztZyy8-2vC32MgXy1Du0Kw6YSUP2qVzv1kTzbkuhW_9Df-AMb1lPx</recordid><startdate>20230120</startdate><enddate>20230120</enddate><creator>Huang, Kai</creator><creator>Li, Bowen</creator><creator>Xiong, Dongliang</creator><creator>Jiang, Haitian</creator><creator>Jiang, Xiaowen</creator><creator>Yan, Xiaolang</creator><creator>Claesen, Luc</creator><creator>Liu, Dehong</creator><creator>Chen, Junjian</creator><creator>Liu, Zhili</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-0001-5619</orcidid><orcidid>https://orcid.org/0000-0002-4882-7504</orcidid><orcidid>https://orcid.org/0000-0001-8215-166X</orcidid><orcidid>https://orcid.org/0000-0003-1515-8989</orcidid><orcidid>https://orcid.org/0000-0003-0405-6290</orcidid><orcidid>https://orcid.org/0000-0001-7525-9672</orcidid><orcidid>https://orcid.org/0000-0001-7229-179X</orcidid><orcidid>https://orcid.org/0000-0002-5409-1425</orcidid><orcidid>https://orcid.org/0000-0003-2295-5433</orcidid><orcidid>https://orcid.org/0000-0002-6283-2262</orcidid></search><sort><creationdate>20230120</creationdate><title>Structured Dynamic Precision for Deep Neural Networks Quantization</title><author>Huang, Kai ; Li, Bowen ; Xiong, Dongliang ; Jiang, Haitian ; Jiang, Xiaowen ; Yan, Xiaolang ; Claesen, Luc ; Liu, Dehong ; Chen, Junjian ; Liu, Zhili</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a244t-1992f0b3ed530bb31a56e2d07c34edf74fe401a76daa881c12897b25c8fccc9b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer systems organization</topic><topic>Computer vision</topic><topic>Computing methodologies</topic><topic>Hardware</topic><topic>Neural networks</topic><topic>Power and energy</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Kai</creatorcontrib><creatorcontrib>Li, Bowen</creatorcontrib><creatorcontrib>Xiong, Dongliang</creatorcontrib><creatorcontrib>Jiang, Haitian</creatorcontrib><creatorcontrib>Jiang, Xiaowen</creatorcontrib><creatorcontrib>Yan, Xiaolang</creatorcontrib><creatorcontrib>Claesen, Luc</creatorcontrib><creatorcontrib>Liu, Dehong</creatorcontrib><creatorcontrib>Chen, Junjian</creatorcontrib><creatorcontrib>Liu, Zhili</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on design automation of electronic systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Kai</au><au>Li, Bowen</au><au>Xiong, Dongliang</au><au>Jiang, Haitian</au><au>Jiang, Xiaowen</au><au>Yan, Xiaolang</au><au>Claesen, Luc</au><au>Liu, Dehong</au><au>Chen, Junjian</au><au>Liu, Zhili</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Structured Dynamic Precision for Deep Neural Networks Quantization</atitle><jtitle>ACM transactions on design automation of electronic systems</jtitle><stitle>ACM TODAES</stitle><date>2023-01-20</date><risdate>2023</risdate><volume>28</volume><issue>1</issue><spage>1</spage><epage>24</epage><pages>1-24</pages><artnum>12</artnum><issn>1084-4309</issn><eissn>1557-7309</eissn><abstract>Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3549535</doi><tpages>24</tpages><orcidid>https://orcid.org/0000-0003-0001-5619</orcidid><orcidid>https://orcid.org/0000-0002-4882-7504</orcidid><orcidid>https://orcid.org/0000-0001-8215-166X</orcidid><orcidid>https://orcid.org/0000-0003-1515-8989</orcidid><orcidid>https://orcid.org/0000-0003-0405-6290</orcidid><orcidid>https://orcid.org/0000-0001-7525-9672</orcidid><orcidid>https://orcid.org/0000-0001-7229-179X</orcidid><orcidid>https://orcid.org/0000-0002-5409-1425</orcidid><orcidid>https://orcid.org/0000-0003-2295-5433</orcidid><orcidid>https://orcid.org/0000-0002-6283-2262</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1084-4309 |
ispartof | ACM transactions on design automation of electronic systems, 2023-01, Vol.28 (1), p.1-24, Article 12 |
issn | 1084-4309 1557-7309 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3549535 |
source | ACM Digital Library Complete |
subjects | Computer systems organization Computer vision Computing methodologies Hardware Neural networks Power and energy |
title | Structured Dynamic Precision for Deep Neural Networks Quantization |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T00%3A23%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Structured%20Dynamic%20Precision%20for%20Deep%20Neural%20Networks%20Quantization&rft.jtitle=ACM%20transactions%20on%20design%20automation%20of%20electronic%20systems&rft.au=Huang,%20Kai&rft.date=2023-01-20&rft.volume=28&rft.issue=1&rft.spage=1&rft.epage=24&rft.pages=1-24&rft.artnum=12&rft.issn=1084-4309&rft.eissn=1557-7309&rft_id=info:doi/10.1145/3549535&rft_dat=%3Cacm_cross%3E3549535%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |