Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines
Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only while training, but also when deployed at scales ranging from...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2018-05 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Settle, Sean O Bollavaram, Manasa D'Alberto, Paolo Elliott Delaye Fernandez, Oscar Fraser, Nicholas Ng, Aaron Sirasao, Ashish Wu, Michael |
description | Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only while training, but also when deployed at scales ranging from data centers all the way down to embedded devices. As such, increasing consideration is being made to maximize the computational efficiency given limited hardware and energy resources and, as a result, inferencing with reduced precision has emerged as a viable alternative to the IEEE 754 Standard for Floating-Point Arithmetic. We propose a quantization scheme that allows inferencing to be carried out using arithmetic that is fundamentally more efficient when compared to even half-precision floating-point. Our quantization procedure is significant in that we determine our quantization scheme parameters by calibrating against its reference floating-point model using a single inference batch rather than (re)training and achieve end-to-end post quantization accuracies comparable to the reference model. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2073512889</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2073512889</sourcerecordid><originalsourceid>FETCH-proquest_journals_20735128893</originalsourceid><addsrcrecordid>eNqNzUELgjAYgOERBEn5HwadB3PLtLMYBhEFHrqJxNSZ7LNvW0K_voh-QKfn8sI7I4GQMmLpRogFCa3tOedim4g4lgG5XnxtnH5p09IMzBMG7zSYeqAn5fGLmwDvljaA9AgTO8OkkBa67VjZIfi2G72jB9MoVOamaG5abZRdkXlTD1aFP5dkvc_LrGAjwsMr66oePH5GthI8kXEk0nQn_6ve-LpChA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2073512889</pqid></control><display><type>article</type><title>Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines</title><source>Free E- Journals</source><creator>Settle, Sean O ; Bollavaram, Manasa ; D'Alberto, Paolo ; Elliott Delaye ; Fernandez, Oscar ; Fraser, Nicholas ; Ng, Aaron ; Sirasao, Ashish ; Wu, Michael</creator><creatorcontrib>Settle, Sean O ; Bollavaram, Manasa ; D'Alberto, Paolo ; Elliott Delaye ; Fernandez, Oscar ; Fraser, Nicholas ; Ng, Aaron ; Sirasao, Ashish ; Wu, Michael</creatorcontrib><description>Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only while training, but also when deployed at scales ranging from data centers all the way down to embedded devices. As such, increasing consideration is being made to maximize the computational efficiency given limited hardware and energy resources and, as a result, inferencing with reduced precision has emerged as a viable alternative to the IEEE 754 Standard for Floating-Point Arithmetic. We propose a quantization scheme that allows inferencing to be carried out using arithmetic that is fundamentally more efficient when compared to even half-precision floating-point. Our quantization procedure is significant in that we determine our quantization scheme parameters by calibrating against its reference floating-point model using a single inference batch rather than (re)training and achieve end-to-end post quantization accuracies comparable to the reference model.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Batch flotation ; Computing time ; Counting ; Data centers ; Electronic devices ; Embedded systems ; Energy sources ; Floating point arithmetic ; Inference ; Machine learning ; Mathematical models ; Measurement ; Model accuracy ; Training</subject><ispartof>arXiv.org, 2018-05</ispartof><rights>2018. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Settle, Sean O</creatorcontrib><creatorcontrib>Bollavaram, Manasa</creatorcontrib><creatorcontrib>D'Alberto, Paolo</creatorcontrib><creatorcontrib>Elliott Delaye</creatorcontrib><creatorcontrib>Fernandez, Oscar</creatorcontrib><creatorcontrib>Fraser, Nicholas</creatorcontrib><creatorcontrib>Ng, Aaron</creatorcontrib><creatorcontrib>Sirasao, Ashish</creatorcontrib><creatorcontrib>Wu, Michael</creatorcontrib><title>Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines</title><title>arXiv.org</title><description>Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only while training, but also when deployed at scales ranging from data centers all the way down to embedded devices. As such, increasing consideration is being made to maximize the computational efficiency given limited hardware and energy resources and, as a result, inferencing with reduced precision has emerged as a viable alternative to the IEEE 754 Standard for Floating-Point Arithmetic. We propose a quantization scheme that allows inferencing to be carried out using arithmetic that is fundamentally more efficient when compared to even half-precision floating-point. Our quantization procedure is significant in that we determine our quantization scheme parameters by calibrating against its reference floating-point model using a single inference batch rather than (re)training and achieve end-to-end post quantization accuracies comparable to the reference model.</description><subject>Artificial neural networks</subject><subject>Batch flotation</subject><subject>Computing time</subject><subject>Counting</subject><subject>Data centers</subject><subject>Electronic devices</subject><subject>Embedded systems</subject><subject>Energy sources</subject><subject>Floating point arithmetic</subject><subject>Inference</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>Measurement</subject><subject>Model accuracy</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNzUELgjAYgOERBEn5HwadB3PLtLMYBhEFHrqJxNSZ7LNvW0K_voh-QKfn8sI7I4GQMmLpRogFCa3tOedim4g4lgG5XnxtnH5p09IMzBMG7zSYeqAn5fGLmwDvljaA9AgTO8OkkBa67VjZIfi2G72jB9MoVOamaG5abZRdkXlTD1aFP5dkvc_LrGAjwsMr66oePH5GthI8kXEk0nQn_6ve-LpChA</recordid><startdate>20180521</startdate><enddate>20180521</enddate><creator>Settle, Sean O</creator><creator>Bollavaram, Manasa</creator><creator>D'Alberto, Paolo</creator><creator>Elliott Delaye</creator><creator>Fernandez, Oscar</creator><creator>Fraser, Nicholas</creator><creator>Ng, Aaron</creator><creator>Sirasao, Ashish</creator><creator>Wu, Michael</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20180521</creationdate><title>Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines</title><author>Settle, Sean O ; Bollavaram, Manasa ; D'Alberto, Paolo ; Elliott Delaye ; Fernandez, Oscar ; Fraser, Nicholas ; Ng, Aaron ; Sirasao, Ashish ; Wu, Michael</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20735128893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial neural networks</topic><topic>Batch flotation</topic><topic>Computing time</topic><topic>Counting</topic><topic>Data centers</topic><topic>Electronic devices</topic><topic>Embedded systems</topic><topic>Energy sources</topic><topic>Floating point arithmetic</topic><topic>Inference</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>Measurement</topic><topic>Model accuracy</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Settle, Sean O</creatorcontrib><creatorcontrib>Bollavaram, Manasa</creatorcontrib><creatorcontrib>D'Alberto, Paolo</creatorcontrib><creatorcontrib>Elliott Delaye</creatorcontrib><creatorcontrib>Fernandez, Oscar</creatorcontrib><creatorcontrib>Fraser, Nicholas</creatorcontrib><creatorcontrib>Ng, Aaron</creatorcontrib><creatorcontrib>Sirasao, Ashish</creatorcontrib><creatorcontrib>Wu, Michael</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Settle, Sean O</au><au>Bollavaram, Manasa</au><au>D'Alberto, Paolo</au><au>Elliott Delaye</au><au>Fernandez, Oscar</au><au>Fraser, Nicholas</au><au>Ng, Aaron</au><au>Sirasao, Ashish</au><au>Wu, Michael</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines</atitle><jtitle>arXiv.org</jtitle><date>2018-05-21</date><risdate>2018</risdate><eissn>2331-8422</eissn><abstract>Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only while training, but also when deployed at scales ranging from data centers all the way down to embedded devices. As such, increasing consideration is being made to maximize the computational efficiency given limited hardware and energy resources and, as a result, inferencing with reduced precision has emerged as a viable alternative to the IEEE 754 Standard for Floating-Point Arithmetic. We propose a quantization scheme that allows inferencing to be carried out using arithmetic that is fundamentally more efficient when compared to even half-precision floating-point. Our quantization procedure is significant in that we determine our quantization scheme parameters by calibrating against its reference floating-point model using a single inference batch rather than (re)training and achieve end-to-end post quantization accuracies comparable to the reference model.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2018-05 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2073512889 |
source | Free E- Journals |
subjects | Artificial neural networks Batch flotation Computing time Counting Data centers Electronic devices Embedded systems Energy sources Floating point arithmetic Inference Machine learning Mathematical models Measurement Model accuracy Training |
title | Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T09%3A39%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Quantizing%20Convolutional%20Neural%20Networks%20for%20Low-Power%20High-Throughput%20Inference%20Engines&rft.jtitle=arXiv.org&rft.au=Settle,%20Sean%20O&rft.date=2018-05-21&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2073512889%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2073512889&rft_id=info:pmid/&rfr_iscdi=true |