Simulating quantized inference on convolutional neural networks

Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers & electrical engineering 2021-10, Vol.95, p.107446, Article 107446
Hauptverfasser:	Finotti, Vitor, Albertini, Bruno
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Classification Complexity Convolutional neural networks Fixed point arithmetic Inference Measurement Neural networks Post-training quantization Size reduction Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	107446
container_title	Computers & electrical engineering
container_volume	95
creator	Finotti, Vitor Albertini, Bruno
description	Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity. •Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization.
doi_str_mv	10.1016/j.compeleceng.2021.107446
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2594202933</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0045790621004055</els_id><sourcerecordid>2594202933</sourcerecordid><originalsourceid>FETCH-LOGICAL-c279t-6c21b301bd3a16181ce5e148a608d5000002e876ba458247c9a1a0671dd2d6603</originalsourceid><addsrcrecordid>eNqNkE1PwzAMhiMEEmPwH4o4tyRpm7QnhCa-pEkcgHOUJd6U0iVbkg7BryelHDjii2XL72v7QeiS4IJgwq67QrntDnpQYDcFxZSkPq8qdoRmpOFtjnldH6MZxlWd8xazU3QWQodTzUgzQzcvZjv0Mhq7yfaDtNF8gc6MXYMHqyBzNlPOHlw_ROOs7DMLg_9J8cP593COTtayD3Dxm-fo7f7udfGYL58fnha3y1xR3sacKUpWJSYrXUqSFhMFNZCqkQw3usZjUGg4W8mqbmjFVSuJxIwTralmDJdzdDX57rzbDxCi6Nzg00FB0Lqt0uNtWaapdppS3oXgYS123myl_xQEi5GX6MQfXmLkJSZeSbuYtJDeOBjwIigzMtDGg4pCO_MPl29b6Xkp</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2594202933</pqid></control><display><type>article</type><title>Simulating quantized inference on convolutional neural networks</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Finotti, Vitor ; Albertini, Bruno</creator><creatorcontrib>Finotti, Vitor ; Albertini, Bruno</creatorcontrib><description>Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity. •Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization.</description><identifier>ISSN: 0045-7906</identifier><identifier>EISSN: 1879-0755</identifier><identifier>DOI: 10.1016/j.compeleceng.2021.107446</identifier><language>eng</language><publisher>Amsterdam: Elsevier Ltd</publisher><subject>Artificial neural networks ; Classification ; Complexity ; Convolutional neural networks ; Fixed point arithmetic ; Inference ; Measurement ; Neural networks ; Post-training quantization ; Size reduction ; Training</subject><ispartof>Computers & electrical engineering, 2021-10, Vol.95, p.107446, Article 107446</ispartof><rights>2021 Elsevier Ltd</rights><rights>Copyright Elsevier BV Oct 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c279t-6c21b301bd3a16181ce5e148a608d5000002e876ba458247c9a1a0671dd2d6603</citedby><cites>FETCH-LOGICAL-c279t-6c21b301bd3a16181ce5e148a608d5000002e876ba458247c9a1a0671dd2d6603</cites><orcidid>0000-0003-3738-6448 ; 0000-0002-8128-3019</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.compeleceng.2021.107446$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Finotti, Vitor</creatorcontrib><creatorcontrib>Albertini, Bruno</creatorcontrib><title>Simulating quantized inference on convolutional neural networks</title><title>Computers & electrical engineering</title><description>Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity. •Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization.</description><subject>Artificial neural networks</subject><subject>Classification</subject><subject>Complexity</subject><subject>Convolutional neural networks</subject><subject>Fixed point arithmetic</subject><subject>Inference</subject><subject>Measurement</subject><subject>Neural networks</subject><subject>Post-training quantization</subject><subject>Size reduction</subject><subject>Training</subject><issn>0045-7906</issn><issn>1879-0755</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNqNkE1PwzAMhiMEEmPwH4o4tyRpm7QnhCa-pEkcgHOUJd6U0iVbkg7BryelHDjii2XL72v7QeiS4IJgwq67QrntDnpQYDcFxZSkPq8qdoRmpOFtjnldH6MZxlWd8xazU3QWQodTzUgzQzcvZjv0Mhq7yfaDtNF8gc6MXYMHqyBzNlPOHlw_ROOs7DMLg_9J8cP593COTtayD3Dxm-fo7f7udfGYL58fnha3y1xR3sacKUpWJSYrXUqSFhMFNZCqkQw3usZjUGg4W8mqbmjFVSuJxIwTralmDJdzdDX57rzbDxCi6Nzg00FB0Lqt0uNtWaapdppS3oXgYS123myl_xQEi5GX6MQfXmLkJSZeSbuYtJDeOBjwIigzMtDGg4pCO_MPl29b6Xkp</recordid><startdate>202110</startdate><enddate>202110</enddate><creator>Finotti, Vitor</creator><creator>Albertini, Bruno</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3738-6448</orcidid><orcidid>https://orcid.org/0000-0002-8128-3019</orcidid></search><sort><creationdate>202110</creationdate><title>Simulating quantized inference on convolutional neural networks</title><author>Finotti, Vitor ; Albertini, Bruno</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c279t-6c21b301bd3a16181ce5e148a608d5000002e876ba458247c9a1a0671dd2d6603</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial neural networks</topic><topic>Classification</topic><topic>Complexity</topic><topic>Convolutional neural networks</topic><topic>Fixed point arithmetic</topic><topic>Inference</topic><topic>Measurement</topic><topic>Neural networks</topic><topic>Post-training quantization</topic><topic>Size reduction</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Finotti, Vitor</creatorcontrib><creatorcontrib>Albertini, Bruno</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computers & electrical engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Finotti, Vitor</au><au>Albertini, Bruno</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Simulating quantized inference on convolutional neural networks</atitle><jtitle>Computers & electrical engineering</jtitle><date>2021-10</date><risdate>2021</risdate><volume>95</volume><spage>107446</spage><pages>107446-</pages><artnum>107446</artnum><issn>0045-7906</issn><eissn>1879-0755</eissn><abstract>Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity. •Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization.</abstract><cop>Amsterdam</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.compeleceng.2021.107446</doi><orcidid>https://orcid.org/0000-0003-3738-6448</orcidid><orcidid>https://orcid.org/0000-0002-8128-3019</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0045-7906
ispartof	Computers & electrical engineering, 2021-10, Vol.95, p.107446, Article 107446
issn	0045-7906 1879-0755
language	eng
recordid	cdi_proquest_journals_2594202933
source	Elsevier ScienceDirect Journals Complete
subjects	Artificial neural networks Classification Complexity Convolutional neural networks Fixed point arithmetic Inference Measurement Neural networks Post-training quantization Size reduction Training
title	Simulating quantized inference on convolutional neural networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T11%3A54%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Simulating%20quantized%20inference%20on%20convolutional%20neural%20networks&rft.jtitle=Computers%20&%20electrical%20engineering&rft.au=Finotti,%20Vitor&rft.date=2021-10&rft.volume=95&rft.spage=107446&rft.pages=107446-&rft.artnum=107446&rft.issn=0045-7906&rft.eissn=1879-0755&rft_id=info:doi/10.1016/j.compeleceng.2021.107446&rft_dat=%3Cproquest_cross%3E2594202933%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2594202933&rft_id=info:pmid/&rft_els_id=S0045790621004055&rfr_iscdi=true