DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhou, Shuchang, Wu, Yuxin, Ni, Zekun, Zhou, Xinyu, Wen, He, Zou, Yuheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Computer Science - Neural and Evolutionary Computing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhou, Shuchang Wu, Yuxin Ni, Zekun Zhou, Xinyu Wen, He Zou, Yuheng
description	We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. As convolutions during forward/backward passes can now operate on low bitwidth weights and activations/gradients respectively, DoReFa-Net can use bit convolution kernels to accelerate both training and inference. Moreover, as bit convolutions can be efficiently implemented on CPU, FPGA, ASIC and GPU, DoReFa-Net opens the way to accelerate training of low bitwidth neural network on these hardware. Our experiments on SVHN and ImageNet datasets prove that DoReFa-Net can achieve comparable prediction accuracy as 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet that has 1-bit weights, 2-bit activations, can be trained from scratch using 6-bit gradients to get 46.1\% top-1 accuracy on ImageNet validation set. The DoReFa-Net AlexNet model is released publicly.
doi_str_mv	10.48550/arxiv.1606.06160
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1606_06160</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1606_06160</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-caa6922f6b45570f853ef37f06b2416cc63c1a89a295e8d71a1b7cbc9f9a9c583</originalsourceid><addsrcrecordid>eNpVj71OwzAYAL0woMIDMOEXSLDj-I8NAi2VoiKhbAzRF8cGixAjx23g7SmBhemW00mH0AUleak4J1cQP_0hp4KInIgjTtHzXXiya8h2Nl3jJoIf_fiC6zDjW59m36dXXIXxEIZ98mGEAe_sPi5Ic4hvE579UfnnbyL03o5pOkMnDobJnv9xhZr1fVM9ZPXjZlvd1BkISTIDIHRRONGVnEviFGfWMemI6IqSCmMEMxSUhkJzq3pJgXbSdEY7DdpwxVbo8je73LUf0b9D_Gp_Ltvlkn0D2x5NoA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients</title><source>arXiv.org</source><creator>Zhou, Shuchang ; Wu, Yuxin ; Ni, Zekun ; Zhou, Xinyu ; Wen, He ; Zou, Yuheng</creator><creatorcontrib>Zhou, Shuchang ; Wu, Yuxin ; Ni, Zekun ; Zhou, Xinyu ; Wen, He ; Zou, Yuheng</creatorcontrib><description>We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. As convolutions during forward/backward passes can now operate on low bitwidth weights and activations/gradients respectively, DoReFa-Net can use bit convolution kernels to accelerate both training and inference. Moreover, as bit convolutions can be efficiently implemented on CPU, FPGA, ASIC and GPU, DoReFa-Net opens the way to accelerate training of low bitwidth neural network on these hardware. Our experiments on SVHN and ImageNet datasets prove that DoReFa-Net can achieve comparable prediction accuracy as 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet that has 1-bit weights, 2-bit activations, can be trained from scratch using 6-bit gradients to get 46.1\% top-1 accuracy on ImageNet validation set. The DoReFa-Net AlexNet model is released publicly.</description><identifier>DOI: 10.48550/arxiv.1606.06160</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Neural and Evolutionary Computing</subject><creationdate>2016-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1606.06160$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1606.06160$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhou, Shuchang</creatorcontrib><creatorcontrib>Wu, Yuxin</creatorcontrib><creatorcontrib>Ni, Zekun</creatorcontrib><creatorcontrib>Zhou, Xinyu</creatorcontrib><creatorcontrib>Wen, He</creatorcontrib><creatorcontrib>Zou, Yuheng</creatorcontrib><title>DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients</title><description>We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. As convolutions during forward/backward passes can now operate on low bitwidth weights and activations/gradients respectively, DoReFa-Net can use bit convolution kernels to accelerate both training and inference. Moreover, as bit convolutions can be efficiently implemented on CPU, FPGA, ASIC and GPU, DoReFa-Net opens the way to accelerate training of low bitwidth neural network on these hardware. Our experiments on SVHN and ImageNet datasets prove that DoReFa-Net can achieve comparable prediction accuracy as 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet that has 1-bit weights, 2-bit activations, can be trained from scratch using 6-bit gradients to get 46.1\% top-1 accuracy on ImageNet validation set. The DoReFa-Net AlexNet model is released publicly.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Neural and Evolutionary Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpVj71OwzAYAL0woMIDMOEXSLDj-I8NAi2VoiKhbAzRF8cGixAjx23g7SmBhemW00mH0AUleak4J1cQP_0hp4KInIgjTtHzXXiya8h2Nl3jJoIf_fiC6zDjW59m36dXXIXxEIZ98mGEAe_sPi5Ic4hvE579UfnnbyL03o5pOkMnDobJnv9xhZr1fVM9ZPXjZlvd1BkISTIDIHRRONGVnEviFGfWMemI6IqSCmMEMxSUhkJzq3pJgXbSdEY7DdpwxVbo8je73LUf0b9D_Gp_Ltvlkn0D2x5NoA</recordid><startdate>20160620</startdate><enddate>20160620</enddate><creator>Zhou, Shuchang</creator><creator>Wu, Yuxin</creator><creator>Ni, Zekun</creator><creator>Zhou, Xinyu</creator><creator>Wen, He</creator><creator>Zou, Yuheng</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20160620</creationdate><title>DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients</title><author>Zhou, Shuchang ; Wu, Yuxin ; Ni, Zekun ; Zhou, Xinyu ; Wen, He ; Zou, Yuheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-caa6922f6b45570f853ef37f06b2416cc63c1a89a295e8d71a1b7cbc9f9a9c583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Neural and Evolutionary Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Shuchang</creatorcontrib><creatorcontrib>Wu, Yuxin</creatorcontrib><creatorcontrib>Ni, Zekun</creatorcontrib><creatorcontrib>Zhou, Xinyu</creatorcontrib><creatorcontrib>Wen, He</creatorcontrib><creatorcontrib>Zou, Yuheng</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhou, Shuchang</au><au>Wu, Yuxin</au><au>Ni, Zekun</au><au>Zhou, Xinyu</au><au>Wen, He</au><au>Zou, Yuheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients</atitle><date>2016-06-20</date><risdate>2016</risdate><abstract>We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. As convolutions during forward/backward passes can now operate on low bitwidth weights and activations/gradients respectively, DoReFa-Net can use bit convolution kernels to accelerate both training and inference. Moreover, as bit convolutions can be efficiently implemented on CPU, FPGA, ASIC and GPU, DoReFa-Net opens the way to accelerate training of low bitwidth neural network on these hardware. Our experiments on SVHN and ImageNet datasets prove that DoReFa-Net can achieve comparable prediction accuracy as 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet that has 1-bit weights, 2-bit activations, can be trained from scratch using 6-bit gradients to get 46.1\% top-1 accuracy on ImageNet validation set. The DoReFa-Net AlexNet model is released publicly.</abstract><doi>10.48550/arxiv.1606.06160</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1606.06160
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1606_06160
source	arXiv.org
subjects	Computer Science - Learning Computer Science - Neural and Evolutionary Computing
title	DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T13%3A42%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DoReFa-Net:%20Training%20Low%20Bitwidth%20Convolutional%20Neural%20Networks%20with%20Low%20Bitwidth%20Gradients&rft.au=Zhou,%20Shuchang&rft.date=2016-06-20&rft_id=info:doi/10.48550/arxiv.1606.06160&rft_dat=%3Carxiv_GOX%3E1606_06160%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true