System-Level Optimization of Accelerator Local Memory for Heterogeneous Systems-on-Chip

In modern system-on-chip architectures, specialized accelerators are increasingly used to improve performance and energy efficiency. The growing complexity of these systems requires the use of system-level design methodologies featuring high-level synthesis (HLS) for generating these components effi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems 2017-03, Vol.36 (3), p.435-448
Hauptverfasser:	Pilato, Christian, Mantovani, Paolo, Di Guglielmo, Giuseppe, Carloni, Luca P.
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerators Algorithm design and analysis Complexity Data structures Hardware Hardware accelerator High level synthesis high-level synthesis (HLS) IP networks memory design Memory management Methodology multibank architecture Optimization Performance enhancement Random access memory System on chip
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In modern system-on-chip architectures, specialized accelerators are increasingly used to improve performance and energy efficiency. The growing complexity of these systems requires the use of system-level design methodologies featuring high-level synthesis (HLS) for generating these components efficiently. Existing HLS tools, however, have limited support for the system-level optimization of memory elements, which typically occupy most of the accelerator area. We present a complete methodology for designing the private local memories (PLMs) of multiple accelerators. Based on the memory requirements of each accelerator, our methodology automatically determines an area-efficient architecture for the PLMs to guarantee performance and reduce the memory cost based on technology-related information. We implemented a prototype tool, called Mnemosyne, that embodies our methodology within a commercial HLS flow. We designed 13 complex accelerators for selected applications from two recently-released benchmark suites (Perfect and CortexSuite). With our approach we are able to reduce the memory cost of single accelerators by up to 45%. Moreover, when reusing memory IPs across accelerators, we achieve area savings that range between 17% and 55% compared to the case where the PLMs are designed separately.
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2016.2611506