DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convo...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose DoReFa-Net, a method to train convolutional neural networks that
have low bitwidth weights and activations using low bitwidth parameter
gradients. In particular, during backward pass, parameter gradients are
stochastically quantized to low bitwidth numbers before being propagated to
convolutional layers. As convolutions during forward/backward passes can now
operate on low bitwidth weights and activations/gradients respectively,
DoReFa-Net can use bit convolution kernels to accelerate both training and
inference. Moreover, as bit convolutions can be efficiently implemented on CPU,
FPGA, ASIC and GPU, DoReFa-Net opens the way to accelerate training of low
bitwidth neural network on these hardware. Our experiments on SVHN and ImageNet
datasets prove that DoReFa-Net can achieve comparable prediction accuracy as
32-bit counterparts. For example, a DoReFa-Net derived from AlexNet that has
1-bit weights, 2-bit activations, can be trained from scratch using 6-bit
gradients to get 46.1\% top-1 accuracy on ImageNet validation set. The
DoReFa-Net AlexNet model is released publicly. |
---|---|
DOI: | 10.48550/arxiv.1606.06160 |