Selective Flexibility: Creating Domain-Specific Reconfigurable Arrays

Historically, hardware acceleration technologies have either been application-specific, therefore lacking in flexibility, or fully programmable, thereby suffering from notable inefficiencies on an application-by-application basis. To address the growing need for domain-specific acceleration technolo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems 2013-05, Vol.32 (5), p.681-694
Hauptverfasser:	Stojilovic, M., Novo, D., Saranovac, L., Brisk, P., Ienne, P.
Format:	Artikel
Sprache:	eng
Schlagworte:	Acceleration Application specific integrated circuits Binary trees Datapaths domain-specific customization Field programmable gate arrays flexibility FPGA routing Heuristic algorithms Merging reconfigurable arrays Routing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Historically, hardware acceleration technologies have either been application-specific, therefore lacking in flexibility, or fully programmable, thereby suffering from notable inefficiencies on an application-by-application basis. To address the growing need for domain-specific acceleration technologies, this paper describes a design methodology (i) to automatically generate a domain-specific coarse-grained array from a set of representative applications and (ii) to introduce limited forms of architectural generality to increase the likelihood that additional applications can be successfully mapped onto it. In particular, coarse-grained arrays generated using our approach are intended to be integrated into customizable processors that use application-specific instruction set extensions to accelerate performance and reduce energy; rather than implementing these extensions using application-specific integrated circuit (ASIC) logic, which lacks flexibility, they can be synthesized onto our reconfigurable array instead, allowing the processor to be used for a variety of applications in related domains. Results show that our array is around 2× slower and 15× larger than an ultimately efficient ASIC implementation, and thus far more efficient than fieldprogrammable gate arrays (FPGAs), which are known to be 3-4× slower and 20-40× larger. Additionally, we estimate that our array is usually around 2× larger and 2× slower than an accelerator synthesized using traditional datapath merging, which has, if any, very limited flexibility beyond the design set of DFGs.
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2012.2235127