Universal Adder Neural Networks
Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplication...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Compared with cheap addition operation, multiplication operation is of much
higher computation complexity. The widely-used convolutions in deep neural
networks are exactly cross-correlation to measure the similarity between input
feature and convolution filters, which involves massive multiplications between
float values. In this paper, we present adder networks (AdderNets) to trade
these massive multiplications in deep neural networks, especially convolutional
neural networks (CNNs), for much cheaper additions to reduce computation costs.
In AdderNets, we take the $\ell_1$-norm distance between filters and input
feature as the output response. We first develop a theoretical foundation for
AdderNets, by showing that both the single hidden layer AdderNet and the
width-bounded deep AdderNet with ReLU activation functions are universal
function approximators. An approximation bound for AdderNets with a single
hidden layer is also presented. We further analyze the influence of this new
similarity measure on the optimization of neural network and develop a special
training scheme for AdderNets. Based on the gradient magnitude, an adaptive
learning rate strategy is proposed to enhance the training procedure of
AdderNets. AdderNets can achieve a 75.7% Top-1 accuracy and a 92.3% Top-5
accuracy using ResNet-50 on the ImageNet dataset without any multiplication in
the convolutional layer. |
---|---|
DOI: | 10.48550/arxiv.2105.14202 |