Simulating quantized inference on convolutional neural networks

Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & electrical engineering 2021-10, Vol.95, p.107446, Article 107446
Hauptverfasser: Finotti, Vitor, Albertini, Bruno
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 107446
container_title Computers & electrical engineering
container_volume 95
creator Finotti, Vitor
Albertini, Bruno
description Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity. •Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization.
doi_str_mv 10.1016/j.compeleceng.2021.107446
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2594202933</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0045790621004055</els_id><sourcerecordid>2594202933</sourcerecordid><originalsourceid>FETCH-LOGICAL-c279t-6c21b301bd3a16181ce5e148a608d5000002e876ba458247c9a1a0671dd2d6603</originalsourceid><addsrcrecordid>eNqNkE1PwzAMhiMEEmPwH4o4tyRpm7QnhCa-pEkcgHOUJd6U0iVbkg7BryelHDjii2XL72v7QeiS4IJgwq67QrntDnpQYDcFxZSkPq8qdoRmpOFtjnldH6MZxlWd8xazU3QWQodTzUgzQzcvZjv0Mhq7yfaDtNF8gc6MXYMHqyBzNlPOHlw_ROOs7DMLg_9J8cP593COTtayD3Dxm-fo7f7udfGYL58fnha3y1xR3sacKUpWJSYrXUqSFhMFNZCqkQw3usZjUGg4W8mqbmjFVSuJxIwTralmDJdzdDX57rzbDxCi6Nzg00FB0Lqt0uNtWaapdppS3oXgYS123myl_xQEi5GX6MQfXmLkJSZeSbuYtJDeOBjwIigzMtDGg4pCO_MPl29b6Xkp</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2594202933</pqid></control><display><type>article</type><title>Simulating quantized inference on convolutional neural networks</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Finotti, Vitor ; Albertini, Bruno</creator><creatorcontrib>Finotti, Vitor ; Albertini, Bruno</creatorcontrib><description>Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity. •Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization.</description><identifier>ISSN: 0045-7906</identifier><identifier>EISSN: 1879-0755</identifier><identifier>DOI: 10.1016/j.compeleceng.2021.107446</identifier><language>eng</language><publisher>Amsterdam: Elsevier Ltd</publisher><subject>Artificial neural networks ; Classification ; Complexity ; Convolutional neural networks ; Fixed point arithmetic ; Inference ; Measurement ; Neural networks ; Post-training quantization ; Size reduction ; Training</subject><ispartof>Computers &amp; electrical engineering, 2021-10, Vol.95, p.107446, Article 107446</ispartof><rights>2021 Elsevier Ltd</rights><rights>Copyright Elsevier BV Oct 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c279t-6c21b301bd3a16181ce5e148a608d5000002e876ba458247c9a1a0671dd2d6603</citedby><cites>FETCH-LOGICAL-c279t-6c21b301bd3a16181ce5e148a608d5000002e876ba458247c9a1a0671dd2d6603</cites><orcidid>0000-0003-3738-6448 ; 0000-0002-8128-3019</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.compeleceng.2021.107446$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Finotti, Vitor</creatorcontrib><creatorcontrib>Albertini, Bruno</creatorcontrib><title>Simulating quantized inference on convolutional neural networks</title><title>Computers &amp; electrical engineering</title><description>Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity. •Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization.</description><subject>Artificial neural networks</subject><subject>Classification</subject><subject>Complexity</subject><subject>Convolutional neural networks</subject><subject>Fixed point arithmetic</subject><subject>Inference</subject><subject>Measurement</subject><subject>Neural networks</subject><subject>Post-training quantization</subject><subject>Size reduction</subject><subject>Training</subject><issn>0045-7906</issn><issn>1879-0755</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNqNkE1PwzAMhiMEEmPwH4o4tyRpm7QnhCa-pEkcgHOUJd6U0iVbkg7BryelHDjii2XL72v7QeiS4IJgwq67QrntDnpQYDcFxZSkPq8qdoRmpOFtjnldH6MZxlWd8xazU3QWQodTzUgzQzcvZjv0Mhq7yfaDtNF8gc6MXYMHqyBzNlPOHlw_ROOs7DMLg_9J8cP593COTtayD3Dxm-fo7f7udfGYL58fnha3y1xR3sacKUpWJSYrXUqSFhMFNZCqkQw3usZjUGg4W8mqbmjFVSuJxIwTralmDJdzdDX57rzbDxCi6Nzg00FB0Lqt0uNtWaapdppS3oXgYS123myl_xQEi5GX6MQfXmLkJSZeSbuYtJDeOBjwIigzMtDGg4pCO_MPl29b6Xkp</recordid><startdate>202110</startdate><enddate>202110</enddate><creator>Finotti, Vitor</creator><creator>Albertini, Bruno</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3738-6448</orcidid><orcidid>https://orcid.org/0000-0002-8128-3019</orcidid></search><sort><creationdate>202110</creationdate><title>Simulating quantized inference on convolutional neural networks</title><author>Finotti, Vitor ; Albertini, Bruno</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c279t-6c21b301bd3a16181ce5e148a608d5000002e876ba458247c9a1a0671dd2d6603</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial neural networks</topic><topic>Classification</topic><topic>Complexity</topic><topic>Convolutional neural networks</topic><topic>Fixed point arithmetic</topic><topic>Inference</topic><topic>Measurement</topic><topic>Neural networks</topic><topic>Post-training quantization</topic><topic>Size reduction</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Finotti, Vitor</creatorcontrib><creatorcontrib>Albertini, Bruno</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computers &amp; electrical engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Finotti, Vitor</au><au>Albertini, Bruno</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Simulating quantized inference on convolutional neural networks</atitle><jtitle>Computers &amp; electrical engineering</jtitle><date>2021-10</date><risdate>2021</risdate><volume>95</volume><spage>107446</spage><pages>107446-</pages><artnum>107446</artnum><issn>0045-7906</issn><eissn>1879-0755</eissn><abstract>Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity. •Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization.</abstract><cop>Amsterdam</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.compeleceng.2021.107446</doi><orcidid>https://orcid.org/0000-0003-3738-6448</orcidid><orcidid>https://orcid.org/0000-0002-8128-3019</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0045-7906
ispartof Computers & electrical engineering, 2021-10, Vol.95, p.107446, Article 107446
issn 0045-7906
1879-0755
language eng
recordid cdi_proquest_journals_2594202933
source Elsevier ScienceDirect Journals Complete
subjects Artificial neural networks
Classification
Complexity
Convolutional neural networks
Fixed point arithmetic
Inference
Measurement
Neural networks
Post-training quantization
Size reduction
Training
title Simulating quantized inference on convolutional neural networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T11%3A54%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Simulating%20quantized%20inference%20on%20convolutional%20neural%20networks&rft.jtitle=Computers%20&%20electrical%20engineering&rft.au=Finotti,%20Vitor&rft.date=2021-10&rft.volume=95&rft.spage=107446&rft.pages=107446-&rft.artnum=107446&rft.issn=0045-7906&rft.eissn=1879-0755&rft_id=info:doi/10.1016/j.compeleceng.2021.107446&rft_dat=%3Cproquest_cross%3E2594202933%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2594202933&rft_id=info:pmid/&rft_els_id=S0045790621004055&rfr_iscdi=true