POLYBiNN: Binary Inference Engine for Neural Networks using Decision Trees
Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) have gained significant popularity in several classification and regression applications. The massive computation and memory requirements of DNN and CNN architectures pose particular challenges for their FPGA implementation. Moreov...
Gespeichert in:
Veröffentlicht in: | Journal of signal processing systems 2020, Vol.92 (1), p.95-107 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) have gained significant popularity in several classification and regression applications. The massive computation and memory requirements of DNN and CNN architectures pose particular challenges for their FPGA implementation. Moreover, programming FPGAs requires hardware-specific knowledge that many machine-learning researchers do not possess. To make the power and versatility of FPGAs available to a wider deep learning user community and to improve DNN design efficiency, we introduce POLYBiNN, an efficient FPGA-based inference engine for DNNs and CNNs. POLYBiNN is composed of a stack of decision trees, which are binary classifiers in nature, and it utilizes AND-OR gates instead of multipliers and accumulators. POLYBiNN is a memory-free inference engine that drastically cuts hardware costs. We also propose a tool for the automatic generation of a low-level hardware description of the trained POLYBiNN for a given application. We evaluate POLYBiNN and the tool for several datasets that are normally solved using fully connected layers. On the MNIST dataset, when implemented in a ZYNQ-7000 ZC706 FPGA, the system achieves a throughput of up to 100 million image classifications per second with 90 ns latency and 97.26
%
accuracy. Moreover, POLYBiNN consumes 8× less power than the best previously published implementations, and it does not require any memory access. We also show how POLYBiNN can be used instead of the fully connected layers of a CNN and apply this approach to the CIFAR-10 dataset. |
---|---|
ISSN: | 1939-8018 1939-8115 |
DOI: | 10.1007/s11265-019-01453-w |