Weighted Quantization-Regularization in DNNs for Weight Memory Minimization Toward HW Implementation

Deployment of deep neural networks on hardware platforms is often constrained by limited on-chip memory and computational power. The proposed weight quantization offers the possibility of optimizing weight memory alongside transforming the weights to hardware friendly data types. We apply dynamic fi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems 2018-11, Vol.37 (11), p.2929-2939
Hauptverfasser:	Wess, Matthias, Dinakarrao, Sai Manoj Pudukotai, Jantsch, Axel
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Compression ratio Computer memory Convolutional neural networks Degradation Embedded systems Hardware Measurement Memory management memory minimization Neural networks Optimization quantization Quantization (signal) Regularization Retraining Scaling Task analysis Training Tuning Weight
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2939
container_issue	11
container_start_page	2929
container_title	IEEE transactions on computer-aided design of integrated circuits and systems
container_volume	37
creator	Wess, Matthias Dinakarrao, Sai Manoj Pudukotai Jantsch, Axel
description	Deployment of deep neural networks on hardware platforms is often constrained by limited on-chip memory and computational power. The proposed weight quantization offers the possibility of optimizing weight memory alongside transforming the weights to hardware friendly data types. We apply dynamic fixed point (DFP) and power-of-two (Po2) quantization in conjunction with layer-wise precision scaling to minimize the weight memory. To alleviate accuracy degradation due to precision scaling, we employ quantization-aware fine-tuning. For fine-tuning, quantization-regularization (QR) and weighted QR are introduced to force the trained quantization by adding the distance of the weights to the desired quantization levels as a regularization term to the loss-function. While DFP quantization performs better when allowing different bit-widths for each layer, Po2 quantization in combination with retraining allows higher compression rates for equal bit-width quantization. The techniques are verified on an all-convolutional network. With accuracy degradation of 0.10% points, for DFP with layer-wise precision scaling we achieve compression ratios of 7.34 for CIFAR-10, 4.7 for CIFAR-100, and 9.33 for SVHN dataset.
doi_str_mv	10.1109/TCAD.2018.2857080
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8412511</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8412511</ieee_id><sourcerecordid>2121956112</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-6b7b7ba98fae53f85d57c6028d3f79e23d9eebf42d2cc65f33a67f87db7060363</originalsourceid><addsrcrecordid>eNo9kFFLwzAQx4MoOKcfQHwJ-NyZS5omeRybusE2USZ7DFmbzIy1nWmLzE9vZ6fcw3Hc738HP4RugQwAiHpYjobjASUgB1RyQSQ5Qz1QTEQxcDhHPUKFjAgR5BJdVdWWEIg5VT2UrazffNQ2w6-NKWr_bWpfFtGb3TQ7E04j9gUeLxYVdmXAXQDPbV6GA577wud_2LL8MiHDkxWe5vudzW1R_y6u0YUzu8renHofvT89LkeTaPbyPB0NZ1FKFaujZC3aMko6YzlzkmdcpAmhMmNOKEtZpqxdu5hmNE0T7hgziXBSZGtBEsIS1kf33d19KD8bW9V6WzahaF9qChQUTwBoS0FHpaGsqmCd3gefm3DQQPRRpj7K1EeZ-iSzzdx1GW-t_edlDJQDsB_YQ3F4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2121956112</pqid></control><display><type>article</type><title>Weighted Quantization-Regularization in DNNs for Weight Memory Minimization Toward HW Implementation</title><source>IEEE Electronic Library (IEL)</source><creator>Wess, Matthias ; Dinakarrao, Sai Manoj Pudukotai ; Jantsch, Axel</creator><creatorcontrib>Wess, Matthias ; Dinakarrao, Sai Manoj Pudukotai ; Jantsch, Axel</creatorcontrib><description>Deployment of deep neural networks on hardware platforms is often constrained by limited on-chip memory and computational power. The proposed weight quantization offers the possibility of optimizing weight memory alongside transforming the weights to hardware friendly data types. We apply dynamic fixed point (DFP) and power-of-two (Po2) quantization in conjunction with layer-wise precision scaling to minimize the weight memory. To alleviate accuracy degradation due to precision scaling, we employ quantization-aware fine-tuning. For fine-tuning, quantization-regularization (QR) and weighted QR are introduced to force the trained quantization by adding the distance of the weights to the desired quantization levels as a regularization term to the loss-function. While DFP quantization performs better when allowing different bit-widths for each layer, Po2 quantization in combination with retraining allows higher compression rates for equal bit-width quantization. The techniques are verified on an all-convolutional network. With accuracy degradation of 0.10% points, for DFP with layer-wise precision scaling we achieve compression ratios of 7.34 for CIFAR-10, 4.7 for CIFAR-100, and 9.33 for SVHN dataset.</description><identifier>ISSN: 0278-0070</identifier><identifier>EISSN: 1937-4151</identifier><identifier>DOI: 10.1109/TCAD.2018.2857080</identifier><identifier>CODEN: ITCSDI</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Artificial neural networks ; Compression ratio ; Computer memory ; Convolutional neural networks ; Degradation ; Embedded systems ; Hardware ; Measurement ; Memory management ; memory minimization ; Neural networks ; Optimization ; quantization ; Quantization (signal) ; Regularization ; Retraining ; Scaling ; Task analysis ; Training ; Tuning ; Weight</subject><ispartof>IEEE transactions on computer-aided design of integrated circuits and systems, 2018-11, Vol.37 (11), p.2929-2939</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-6b7b7ba98fae53f85d57c6028d3f79e23d9eebf42d2cc65f33a67f87db7060363</citedby><cites>FETCH-LOGICAL-c293t-6b7b7ba98fae53f85d57c6028d3f79e23d9eebf42d2cc65f33a67f87db7060363</cites><orcidid>0000-0003-2251-0004 ; 0000-0002-4417-2387 ; 0000-0002-1877-4114</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8412511$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8412511$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wess, Matthias</creatorcontrib><creatorcontrib>Dinakarrao, Sai Manoj Pudukotai</creatorcontrib><creatorcontrib>Jantsch, Axel</creatorcontrib><title>Weighted Quantization-Regularization in DNNs for Weight Memory Minimization Toward HW Implementation</title><title>IEEE transactions on computer-aided design of integrated circuits and systems</title><addtitle>TCAD</addtitle><description>Deployment of deep neural networks on hardware platforms is often constrained by limited on-chip memory and computational power. The proposed weight quantization offers the possibility of optimizing weight memory alongside transforming the weights to hardware friendly data types. We apply dynamic fixed point (DFP) and power-of-two (Po2) quantization in conjunction with layer-wise precision scaling to minimize the weight memory. To alleviate accuracy degradation due to precision scaling, we employ quantization-aware fine-tuning. For fine-tuning, quantization-regularization (QR) and weighted QR are introduced to force the trained quantization by adding the distance of the weights to the desired quantization levels as a regularization term to the loss-function. While DFP quantization performs better when allowing different bit-widths for each layer, Po2 quantization in combination with retraining allows higher compression rates for equal bit-width quantization. The techniques are verified on an all-convolutional network. With accuracy degradation of 0.10% points, for DFP with layer-wise precision scaling we achieve compression ratios of 7.34 for CIFAR-10, 4.7 for CIFAR-100, and 9.33 for SVHN dataset.</description><subject>Artificial neural networks</subject><subject>Compression ratio</subject><subject>Computer memory</subject><subject>Convolutional neural networks</subject><subject>Degradation</subject><subject>Embedded systems</subject><subject>Hardware</subject><subject>Measurement</subject><subject>Memory management</subject><subject>memory minimization</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>quantization</subject><subject>Quantization (signal)</subject><subject>Regularization</subject><subject>Retraining</subject><subject>Scaling</subject><subject>Task analysis</subject><subject>Training</subject><subject>Tuning</subject><subject>Weight</subject><issn>0278-0070</issn><issn>1937-4151</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kFFLwzAQx4MoOKcfQHwJ-NyZS5omeRybusE2USZ7DFmbzIy1nWmLzE9vZ6fcw3Hc738HP4RugQwAiHpYjobjASUgB1RyQSQ5Qz1QTEQxcDhHPUKFjAgR5BJdVdWWEIg5VT2UrazffNQ2w6-NKWr_bWpfFtGb3TQ7E04j9gUeLxYVdmXAXQDPbV6GA577wud_2LL8MiHDkxWe5vudzW1R_y6u0YUzu8renHofvT89LkeTaPbyPB0NZ1FKFaujZC3aMko6YzlzkmdcpAmhMmNOKEtZpqxdu5hmNE0T7hgziXBSZGtBEsIS1kf33d19KD8bW9V6WzahaF9qChQUTwBoS0FHpaGsqmCd3gefm3DQQPRRpj7K1EeZ-iSzzdx1GW-t_edlDJQDsB_YQ3F4</recordid><startdate>20181101</startdate><enddate>20181101</enddate><creator>Wess, Matthias</creator><creator>Dinakarrao, Sai Manoj Pudukotai</creator><creator>Jantsch, Axel</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-2251-0004</orcidid><orcidid>https://orcid.org/0000-0002-4417-2387</orcidid><orcidid>https://orcid.org/0000-0002-1877-4114</orcidid></search><sort><creationdate>20181101</creationdate><title>Weighted Quantization-Regularization in DNNs for Weight Memory Minimization Toward HW Implementation</title><author>Wess, Matthias ; Dinakarrao, Sai Manoj Pudukotai ; Jantsch, Axel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-6b7b7ba98fae53f85d57c6028d3f79e23d9eebf42d2cc65f33a67f87db7060363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial neural networks</topic><topic>Compression ratio</topic><topic>Computer memory</topic><topic>Convolutional neural networks</topic><topic>Degradation</topic><topic>Embedded systems</topic><topic>Hardware</topic><topic>Measurement</topic><topic>Memory management</topic><topic>memory minimization</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>quantization</topic><topic>Quantization (signal)</topic><topic>Regularization</topic><topic>Retraining</topic><topic>Scaling</topic><topic>Task analysis</topic><topic>Training</topic><topic>Tuning</topic><topic>Weight</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wess, Matthias</creatorcontrib><creatorcontrib>Dinakarrao, Sai Manoj Pudukotai</creatorcontrib><creatorcontrib>Jantsch, Axel</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wess, Matthias</au><au>Dinakarrao, Sai Manoj Pudukotai</au><au>Jantsch, Axel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Weighted Quantization-Regularization in DNNs for Weight Memory Minimization Toward HW Implementation</atitle><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle><stitle>TCAD</stitle><date>2018-11-01</date><risdate>2018</risdate><volume>37</volume><issue>11</issue><spage>2929</spage><epage>2939</epage><pages>2929-2939</pages><issn>0278-0070</issn><eissn>1937-4151</eissn><coden>ITCSDI</coden><abstract>Deployment of deep neural networks on hardware platforms is often constrained by limited on-chip memory and computational power. The proposed weight quantization offers the possibility of optimizing weight memory alongside transforming the weights to hardware friendly data types. We apply dynamic fixed point (DFP) and power-of-two (Po2) quantization in conjunction with layer-wise precision scaling to minimize the weight memory. To alleviate accuracy degradation due to precision scaling, we employ quantization-aware fine-tuning. For fine-tuning, quantization-regularization (QR) and weighted QR are introduced to force the trained quantization by adding the distance of the weights to the desired quantization levels as a regularization term to the loss-function. While DFP quantization performs better when allowing different bit-widths for each layer, Po2 quantization in combination with retraining allows higher compression rates for equal bit-width quantization. The techniques are verified on an all-convolutional network. With accuracy degradation of 0.10% points, for DFP with layer-wise precision scaling we achieve compression ratios of 7.34 for CIFAR-10, 4.7 for CIFAR-100, and 9.33 for SVHN dataset.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCAD.2018.2857080</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-2251-0004</orcidid><orcidid>https://orcid.org/0000-0002-4417-2387</orcidid><orcidid>https://orcid.org/0000-0002-1877-4114</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0278-0070
ispartof	IEEE transactions on computer-aided design of integrated circuits and systems, 2018-11, Vol.37 (11), p.2929-2939
issn	0278-0070 1937-4151
language	eng
recordid	cdi_ieee_primary_8412511
source	IEEE Electronic Library (IEL)
subjects	Artificial neural networks Compression ratio Computer memory Convolutional neural networks Degradation Embedded systems Hardware Measurement Memory management memory minimization Neural networks Optimization quantization Quantization (signal) Regularization Retraining Scaling Task analysis Training Tuning Weight
title	Weighted Quantization-Regularization in DNNs for Weight Memory Minimization Toward HW Implementation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T05%3A48%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Weighted%20Quantization-Regularization%20in%20DNNs%20for%20Weight%20Memory%20Minimization%20Toward%20HW%20Implementation&rft.jtitle=IEEE%20transactions%20on%20computer-aided%20design%20of%20integrated%20circuits%20and%20systems&rft.au=Wess,%20Matthias&rft.date=2018-11-01&rft.volume=37&rft.issue=11&rft.spage=2929&rft.epage=2939&rft.pages=2929-2939&rft.issn=0278-0070&rft.eissn=1937-4151&rft.coden=ITCSDI&rft_id=info:doi/10.1109/TCAD.2018.2857080&rft_dat=%3Cproquest_RIE%3E2121956112%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2121956112&rft_id=info:pmid/&rft_ieee_id=8412511&rfr_iscdi=true