Simulating quantized inference on convolutional neural networks
Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support t...
Gespeichert in:
Veröffentlicht in: | Computers & electrical engineering 2021-10, Vol.95, p.107446, Article 107446 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 107446 |
container_title | Computers & electrical engineering |
container_volume | 95 |
creator | Finotti, Vitor Albertini, Bruno |
description | Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity.
•Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization. |
doi_str_mv | 10.1016/j.compeleceng.2021.107446 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2594202933</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0045790621004055</els_id><sourcerecordid>2594202933</sourcerecordid><originalsourceid>FETCH-LOGICAL-c279t-6c21b301bd3a16181ce5e148a608d5000002e876ba458247c9a1a0671dd2d6603</originalsourceid><addsrcrecordid>eNqNkE1PwzAMhiMEEmPwH4o4tyRpm7QnhCa-pEkcgHOUJd6U0iVbkg7BryelHDjii2XL72v7QeiS4IJgwq67QrntDnpQYDcFxZSkPq8qdoRmpOFtjnldH6MZxlWd8xazU3QWQodTzUgzQzcvZjv0Mhq7yfaDtNF8gc6MXYMHqyBzNlPOHlw_ROOs7DMLg_9J8cP593COTtayD3Dxm-fo7f7udfGYL58fnha3y1xR3sacKUpWJSYrXUqSFhMFNZCqkQw3usZjUGg4W8mqbmjFVSuJxIwTralmDJdzdDX57rzbDxCi6Nzg00FB0Lqt0uNtWaapdppS3oXgYS123myl_xQEi5GX6MQfXmLkJSZeSbuYtJDeOBjwIigzMtDGg4pCO_MPl29b6Xkp</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2594202933</pqid></control><display><type>article</type><title>Simulating quantized inference on convolutional neural networks</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Finotti, Vitor ; Albertini, Bruno</creator><creatorcontrib>Finotti, Vitor ; Albertini, Bruno</creatorcontrib><description>Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity.
•Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization.</description><identifier>ISSN: 0045-7906</identifier><identifier>EISSN: 1879-0755</identifier><identifier>DOI: 10.1016/j.compeleceng.2021.107446</identifier><language>eng</language><publisher>Amsterdam: Elsevier Ltd</publisher><subject>Artificial neural networks ; Classification ; Complexity ; Convolutional neural networks ; Fixed point arithmetic ; Inference ; Measurement ; Neural networks ; Post-training quantization ; Size reduction ; Training</subject><ispartof>Computers & electrical engineering, 2021-10, Vol.95, p.107446, Article 107446</ispartof><rights>2021 Elsevier Ltd</rights><rights>Copyright Elsevier BV Oct 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c279t-6c21b301bd3a16181ce5e148a608d5000002e876ba458247c9a1a0671dd2d6603</citedby><cites>FETCH-LOGICAL-c279t-6c21b301bd3a16181ce5e148a608d5000002e876ba458247c9a1a0671dd2d6603</cites><orcidid>0000-0003-3738-6448 ; 0000-0002-8128-3019</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.compeleceng.2021.107446$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Finotti, Vitor</creatorcontrib><creatorcontrib>Albertini, Bruno</creatorcontrib><title>Simulating quantized inference on convolutional neural networks</title><title>Computers & electrical engineering</title><description>Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity.
•Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization.</description><subject>Artificial neural networks</subject><subject>Classification</subject><subject>Complexity</subject><subject>Convolutional neural networks</subject><subject>Fixed point arithmetic</subject><subject>Inference</subject><subject>Measurement</subject><subject>Neural networks</subject><subject>Post-training quantization</subject><subject>Size reduction</subject><subject>Training</subject><issn>0045-7906</issn><issn>1879-0755</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNqNkE1PwzAMhiMEEmPwH4o4tyRpm7QnhCa-pEkcgHOUJd6U0iVbkg7BryelHDjii2XL72v7QeiS4IJgwq67QrntDnpQYDcFxZSkPq8qdoRmpOFtjnldH6MZxlWd8xazU3QWQodTzUgzQzcvZjv0Mhq7yfaDtNF8gc6MXYMHqyBzNlPOHlw_ROOs7DMLg_9J8cP593COTtayD3Dxm-fo7f7udfGYL58fnha3y1xR3sacKUpWJSYrXUqSFhMFNZCqkQw3usZjUGg4W8mqbmjFVSuJxIwTralmDJdzdDX57rzbDxCi6Nzg00FB0Lqt0uNtWaapdppS3oXgYS123myl_xQEi5GX6MQfXmLkJSZeSbuYtJDeOBjwIigzMtDGg4pCO_MPl29b6Xkp</recordid><startdate>202110</startdate><enddate>202110</enddate><creator>Finotti, Vitor</creator><creator>Albertini, Bruno</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3738-6448</orcidid><orcidid>https://orcid.org/0000-0002-8128-3019</orcidid></search><sort><creationdate>202110</creationdate><title>Simulating quantized inference on convolutional neural networks</title><author>Finotti, Vitor ; Albertini, Bruno</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c279t-6c21b301bd3a16181ce5e148a608d5000002e876ba458247c9a1a0671dd2d6603</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial neural networks</topic><topic>Classification</topic><topic>Complexity</topic><topic>Convolutional neural networks</topic><topic>Fixed point arithmetic</topic><topic>Inference</topic><topic>Measurement</topic><topic>Neural networks</topic><topic>Post-training quantization</topic><topic>Size reduction</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Finotti, Vitor</creatorcontrib><creatorcontrib>Albertini, Bruno</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computers & electrical engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Finotti, Vitor</au><au>Albertini, Bruno</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Simulating quantized inference on convolutional neural networks</atitle><jtitle>Computers & electrical engineering</jtitle><date>2021-10</date><risdate>2021</risdate><volume>95</volume><spage>107446</spage><pages>107446-</pages><artnum>107446</artnum><issn>0045-7906</issn><eissn>1879-0755</eissn><abstract>Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity.
•Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization.</abstract><cop>Amsterdam</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.compeleceng.2021.107446</doi><orcidid>https://orcid.org/0000-0003-3738-6448</orcidid><orcidid>https://orcid.org/0000-0002-8128-3019</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0045-7906 |
ispartof | Computers & electrical engineering, 2021-10, Vol.95, p.107446, Article 107446 |
issn | 0045-7906 1879-0755 |
language | eng |
recordid | cdi_proquest_journals_2594202933 |
source | Elsevier ScienceDirect Journals Complete |
subjects | Artificial neural networks Classification Complexity Convolutional neural networks Fixed point arithmetic Inference Measurement Neural networks Post-training quantization Size reduction Training |
title | Simulating quantized inference on convolutional neural networks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T11%3A54%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Simulating%20quantized%20inference%20on%20convolutional%20neural%20networks&rft.jtitle=Computers%20&%20electrical%20engineering&rft.au=Finotti,%20Vitor&rft.date=2021-10&rft.volume=95&rft.spage=107446&rft.pages=107446-&rft.artnum=107446&rft.issn=0045-7906&rft.eissn=1879-0755&rft_id=info:doi/10.1016/j.compeleceng.2021.107446&rft_dat=%3Cproquest_cross%3E2594202933%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2594202933&rft_id=info:pmid/&rft_els_id=S0045790621004055&rfr_iscdi=true |