Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines

Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only while training, but also when deployed at scales ranging from...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2018-05
Hauptverfasser: Settle, Sean O, Bollavaram, Manasa, D'Alberto, Paolo, Elliott Delaye, Fernandez, Oscar, Fraser, Nicholas, Ng, Aaron, Sirasao, Ashish, Wu, Michael
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Settle, Sean O
Bollavaram, Manasa
D'Alberto, Paolo
Elliott Delaye
Fernandez, Oscar
Fraser, Nicholas
Ng, Aaron
Sirasao, Ashish
Wu, Michael
description Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only while training, but also when deployed at scales ranging from data centers all the way down to embedded devices. As such, increasing consideration is being made to maximize the computational efficiency given limited hardware and energy resources and, as a result, inferencing with reduced precision has emerged as a viable alternative to the IEEE 754 Standard for Floating-Point Arithmetic. We propose a quantization scheme that allows inferencing to be carried out using arithmetic that is fundamentally more efficient when compared to even half-precision floating-point. Our quantization procedure is significant in that we determine our quantization scheme parameters by calibrating against its reference floating-point model using a single inference batch rather than (re)training and achieve end-to-end post quantization accuracies comparable to the reference model.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2073512889</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2073512889</sourcerecordid><originalsourceid>FETCH-proquest_journals_20735128893</originalsourceid><addsrcrecordid>eNqNzUELgjAYgOERBEn5HwadB3PLtLMYBhEFHrqJxNSZ7LNvW0K_voh-QKfn8sI7I4GQMmLpRogFCa3tOedim4g4lgG5XnxtnH5p09IMzBMG7zSYeqAn5fGLmwDvljaA9AgTO8OkkBa67VjZIfi2G72jB9MoVOamaG5abZRdkXlTD1aFP5dkvc_LrGAjwsMr66oePH5GthI8kXEk0nQn_6ve-LpChA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2073512889</pqid></control><display><type>article</type><title>Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines</title><source>Free E- Journals</source><creator>Settle, Sean O ; Bollavaram, Manasa ; D'Alberto, Paolo ; Elliott Delaye ; Fernandez, Oscar ; Fraser, Nicholas ; Ng, Aaron ; Sirasao, Ashish ; Wu, Michael</creator><creatorcontrib>Settle, Sean O ; Bollavaram, Manasa ; D'Alberto, Paolo ; Elliott Delaye ; Fernandez, Oscar ; Fraser, Nicholas ; Ng, Aaron ; Sirasao, Ashish ; Wu, Michael</creatorcontrib><description>Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only while training, but also when deployed at scales ranging from data centers all the way down to embedded devices. As such, increasing consideration is being made to maximize the computational efficiency given limited hardware and energy resources and, as a result, inferencing with reduced precision has emerged as a viable alternative to the IEEE 754 Standard for Floating-Point Arithmetic. We propose a quantization scheme that allows inferencing to be carried out using arithmetic that is fundamentally more efficient when compared to even half-precision floating-point. Our quantization procedure is significant in that we determine our quantization scheme parameters by calibrating against its reference floating-point model using a single inference batch rather than (re)training and achieve end-to-end post quantization accuracies comparable to the reference model.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Batch flotation ; Computing time ; Counting ; Data centers ; Electronic devices ; Embedded systems ; Energy sources ; Floating point arithmetic ; Inference ; Machine learning ; Mathematical models ; Measurement ; Model accuracy ; Training</subject><ispartof>arXiv.org, 2018-05</ispartof><rights>2018. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Settle, Sean O</creatorcontrib><creatorcontrib>Bollavaram, Manasa</creatorcontrib><creatorcontrib>D'Alberto, Paolo</creatorcontrib><creatorcontrib>Elliott Delaye</creatorcontrib><creatorcontrib>Fernandez, Oscar</creatorcontrib><creatorcontrib>Fraser, Nicholas</creatorcontrib><creatorcontrib>Ng, Aaron</creatorcontrib><creatorcontrib>Sirasao, Ashish</creatorcontrib><creatorcontrib>Wu, Michael</creatorcontrib><title>Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines</title><title>arXiv.org</title><description>Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only while training, but also when deployed at scales ranging from data centers all the way down to embedded devices. As such, increasing consideration is being made to maximize the computational efficiency given limited hardware and energy resources and, as a result, inferencing with reduced precision has emerged as a viable alternative to the IEEE 754 Standard for Floating-Point Arithmetic. We propose a quantization scheme that allows inferencing to be carried out using arithmetic that is fundamentally more efficient when compared to even half-precision floating-point. Our quantization procedure is significant in that we determine our quantization scheme parameters by calibrating against its reference floating-point model using a single inference batch rather than (re)training and achieve end-to-end post quantization accuracies comparable to the reference model.</description><subject>Artificial neural networks</subject><subject>Batch flotation</subject><subject>Computing time</subject><subject>Counting</subject><subject>Data centers</subject><subject>Electronic devices</subject><subject>Embedded systems</subject><subject>Energy sources</subject><subject>Floating point arithmetic</subject><subject>Inference</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>Measurement</subject><subject>Model accuracy</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNzUELgjAYgOERBEn5HwadB3PLtLMYBhEFHrqJxNSZ7LNvW0K_voh-QKfn8sI7I4GQMmLpRogFCa3tOedim4g4lgG5XnxtnH5p09IMzBMG7zSYeqAn5fGLmwDvljaA9AgTO8OkkBa67VjZIfi2G72jB9MoVOamaG5abZRdkXlTD1aFP5dkvc_LrGAjwsMr66oePH5GthI8kXEk0nQn_6ve-LpChA</recordid><startdate>20180521</startdate><enddate>20180521</enddate><creator>Settle, Sean O</creator><creator>Bollavaram, Manasa</creator><creator>D'Alberto, Paolo</creator><creator>Elliott Delaye</creator><creator>Fernandez, Oscar</creator><creator>Fraser, Nicholas</creator><creator>Ng, Aaron</creator><creator>Sirasao, Ashish</creator><creator>Wu, Michael</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20180521</creationdate><title>Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines</title><author>Settle, Sean O ; Bollavaram, Manasa ; D'Alberto, Paolo ; Elliott Delaye ; Fernandez, Oscar ; Fraser, Nicholas ; Ng, Aaron ; Sirasao, Ashish ; Wu, Michael</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20735128893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial neural networks</topic><topic>Batch flotation</topic><topic>Computing time</topic><topic>Counting</topic><topic>Data centers</topic><topic>Electronic devices</topic><topic>Embedded systems</topic><topic>Energy sources</topic><topic>Floating point arithmetic</topic><topic>Inference</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>Measurement</topic><topic>Model accuracy</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Settle, Sean O</creatorcontrib><creatorcontrib>Bollavaram, Manasa</creatorcontrib><creatorcontrib>D'Alberto, Paolo</creatorcontrib><creatorcontrib>Elliott Delaye</creatorcontrib><creatorcontrib>Fernandez, Oscar</creatorcontrib><creatorcontrib>Fraser, Nicholas</creatorcontrib><creatorcontrib>Ng, Aaron</creatorcontrib><creatorcontrib>Sirasao, Ashish</creatorcontrib><creatorcontrib>Wu, Michael</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Settle, Sean O</au><au>Bollavaram, Manasa</au><au>D'Alberto, Paolo</au><au>Elliott Delaye</au><au>Fernandez, Oscar</au><au>Fraser, Nicholas</au><au>Ng, Aaron</au><au>Sirasao, Ashish</au><au>Wu, Michael</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines</atitle><jtitle>arXiv.org</jtitle><date>2018-05-21</date><risdate>2018</risdate><eissn>2331-8422</eissn><abstract>Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only while training, but also when deployed at scales ranging from data centers all the way down to embedded devices. As such, increasing consideration is being made to maximize the computational efficiency given limited hardware and energy resources and, as a result, inferencing with reduced precision has emerged as a viable alternative to the IEEE 754 Standard for Floating-Point Arithmetic. We propose a quantization scheme that allows inferencing to be carried out using arithmetic that is fundamentally more efficient when compared to even half-precision floating-point. Our quantization procedure is significant in that we determine our quantization scheme parameters by calibrating against its reference floating-point model using a single inference batch rather than (re)training and achieve end-to-end post quantization accuracies comparable to the reference model.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2018-05
issn 2331-8422
language eng
recordid cdi_proquest_journals_2073512889
source Free E- Journals
subjects Artificial neural networks
Batch flotation
Computing time
Counting
Data centers
Electronic devices
Embedded systems
Energy sources
Floating point arithmetic
Inference
Machine learning
Mathematical models
Measurement
Model accuracy
Training
title Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T09%3A39%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Quantizing%20Convolutional%20Neural%20Networks%20for%20Low-Power%20High-Throughput%20Inference%20Engines&rft.jtitle=arXiv.org&rft.au=Settle,%20Sean%20O&rft.date=2018-05-21&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2073512889%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2073512889&rft_id=info:pmid/&rfr_iscdi=true