Hybrid Binary-Unary Hardware Accelerator

Stream-based computing such as stochastic computing has been used in recent years to create designs with significantly smaller area by harnessing unary encoding of data. However, the area saving comes at an exponential price in latency, making the area × delay cost unattractive. In this article, we...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computers 2020-09, Vol.69 (9), p.1308-1319
Hauptverfasser: Faraji, S. Rasoul, Bazargan, Kia
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Stream-based computing such as stochastic computing has been used in recent years to create designs with significantly smaller area by harnessing unary encoding of data. However, the area saving comes at an exponential price in latency, making the area × delay cost unattractive. In this article, we present a novel method which uses a hybrid binary / unary representation to perform computations. We first divide the input range into a few sub-regions, perform unary computations on each sub-region individually, and finally pack the outputs of all sub-regions back to compact binary. Moreover, we propose a synthesis methodology and a regression model to predict an optimal or close-to-optimal design in the design space. To the best of our knowledge, we are the first to show a scalable method based on parallel bit-stream data representation that can beat conventional binary in terms of a real cost, i.e., area × delay and energy consumption in almost all functions that we tried at resolutions of 8-, 10-, and 12-bits. Our method outperforms the binary, stochastic, and fully unary methods on a number of functions, especially low-cost binary CORDIC-based functions, and on a common edge detection algorithm on FPGA and in ASIC implementation. In terms of area × delay cost, our {on FPGA, in ASIC} cost is on average only {4.72\% 4.72% , 24.36\% 24.36% } and {20.16\% 20.16% , 60.12\% 60.12% } of the parallel binary pipeline implementation at 8- and 10-bit resolution, respectively. These numbers are 2-3 orders of magnitude better than the results of t
ISSN:0018-9340
1557-9956
DOI:10.1109/TC.2020.2971596