Selective Flexibility: Creating Domain-Specific Reconfigurable Arrays
Historically, hardware acceleration technologies have either been application-specific, therefore lacking in flexibility, or fully programmable, thereby suffering from notable inefficiencies on an application-by-application basis. To address the growing need for domain-specific acceleration technolo...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on computer-aided design of integrated circuits and systems 2013-05, Vol.32 (5), p.681-694 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Historically, hardware acceleration technologies have either been application-specific, therefore lacking in flexibility, or fully programmable, thereby suffering from notable inefficiencies on an application-by-application basis. To address the growing need for domain-specific acceleration technologies, this paper describes a design methodology (i) to automatically generate a domain-specific coarse-grained array from a set of representative applications and (ii) to introduce limited forms of architectural generality to increase the likelihood that additional applications can be successfully mapped onto it. In particular, coarse-grained arrays generated using our approach are intended to be integrated into customizable processors that use application-specific instruction set extensions to accelerate performance and reduce energy; rather than implementing these extensions using application-specific integrated circuit (ASIC) logic, which lacks flexibility, they can be synthesized onto our reconfigurable array instead, allowing the processor to be used for a variety of applications in related domains. Results show that our array is around 2× slower and 15× larger than an ultimately efficient ASIC implementation, and thus far more efficient than fieldprogrammable gate arrays (FPGAs), which are known to be 3-4× slower and 20-40× larger. Additionally, we estimate that our array is usually around 2× larger and 2× slower than an accelerator synthesized using traditional datapath merging, which has, if any, very limited flexibility beyond the design set of DFGs. |
---|---|
ISSN: | 0278-0070 1937-4151 |
DOI: | 10.1109/TCAD.2012.2235127 |