Bridging the computation gap between programmable processors and hardwired accelerators

New media and signal processing applications demand ever higher performance while operating within the tight power constraints of mobile devices. A range of hardware implementations is available to deliver computation with varying degrees of area and power efficiency, from general-purpose processors...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Fan, K., Kudlur, M., Dasika, G., Mahlke, S.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:New media and signal processing applications demand ever higher performance while operating within the tight power constraints of mobile devices. A range of hardware implementations is available to deliver computation with varying degrees of area and power efficiency, from general-purpose processors to application-specific integrated circuits (ASICs). The tradeoff of moving towards more efficient customized solutions such as ASICs is the lack of flexibility in terms of hardware reusability and programmability. In this paper, we propose a customized semi-programmable loop accelerator architecture that exploits the efficiency gains available through high levels of customization, while maintaining sufficient flexibility to execute multiple similar loops. A customized instance of the loop accelerator architecture is generated for a particular loop and then the data and control paths are proactively generalized in an efficient manner to increase flexibility. A compiler mapping phase is then able to map other loops onto the same hardware. The efficiency of the programmable accelerator is compared with non-programmable accelerators and with the OpenRISC 1200 general purpose processor. The programmable accelerator is able to achieve up to 34x better power efficiency and 30x better area efficiency than a simple general purpose processor, while trading off as little as 2x power and area efficiency to the non-programmable accelerator.
ISSN:1530-0897
2378-203X
DOI:10.1109/HPCA.2009.4798266