Hardware-Driven Nonlinear Activation for Stochastic Computing Based Deep Convolutional Neural Networks
Recently, Deep Convolutional Neural Networks (DCNNs) have made unprecedented progress, achieving the accuracy close to, or even better than human-level perception in various tasks. There is a timely need to map the latest software DCNNs to application-specific hardware, in order to achieve orders of...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, Deep Convolutional Neural Networks (DCNNs) have made unprecedented
progress, achieving the accuracy close to, or even better than human-level
perception in various tasks. There is a timely need to map the latest software
DCNNs to application-specific hardware, in order to achieve orders of magnitude
improvement in performance, energy efficiency and compactness. Stochastic
Computing (SC), as a low-cost alternative to the conventional binary computing
paradigm, has the potential to enable massively parallel and highly scalable
hardware implementation of DCNNs. One major challenge in SC based DCNNs is
designing accurate nonlinear activation functions, which have a significant
impact on the network-level accuracy but cannot be implemented accurately by
existing SC computing blocks. In this paper, we design and optimize SC based
neurons, and we propose highly accurate activation designs for the three most
frequently used activation functions in software DCNNs, i.e, hyperbolic
tangent, logistic, and rectified linear units. Experimental results on LeNet-5
using MNIST dataset demonstrate that compared with a binary ASIC hardware DCNN,
the DCNN with the proposed SC neurons can achieve up to 61X, 151X, and 2X
improvement in terms of area, power, and energy, respectively, at the cost of
small precision degradation.In addition, the SC approach achieves up to 21X and
41X of the area, 41X and 72X of the power, and 198200X and 96443X of the
energy, compared with CPU and GPU approaches, respectively, while the error is
increased by less than 3.07%. ReLU activation is suggested for future SC based
DCNNs considering its superior performance under a small bit stream length. |
---|---|
DOI: | 10.48550/arxiv.1703.04135 |