Exploring Multilevel Cache Hierarchies in Application Specific MPSoCs

Multiprocessor systems make use of multilevel cache hierarchies to improve overall memory access speed. Embedded systems typically use configurable processors, where the caches in the system can be customized for a given application or a set of applications. Finding the optimal or a near-optimal set...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computer-aided design of integrated circuits and systems 2015-12, Vol.34 (12), p.1991-2003
Hauptverfasser: Nawinne, Isuru, Javaid, Haris, Ragel, Roshan, Radhakrishnan, Swarnalatha, Parameswaran, Sri
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Multiprocessor systems make use of multilevel cache hierarchies to improve overall memory access speed. Embedded systems typically use configurable processors, where the caches in the system can be customized for a given application or a set of applications. Finding the optimal or a near-optimal set size, block size, and associativity of each of the caches in a multilevel cache hierarchy is a challenging task due to the presence of billions or even trillions of design points. This paper presents an iterative exploration method to find suitable configurations for all the caches in the hierarchy of an application specific multiprocessor system-on-chip, to improve memory access speed. We propose an algorithm and combine it with the use of specialized hardware for parallel cache simulation to enable multiple back-and-forth iterations through the cache levels. In every iteration, our algorithm explores selected portions of the entire design space to quickly converge upon the final design point. We demonstrate our methodology on two- and three-level cache hierarchies with private and shared caches in a quad-core system, respectively, consisting of 5.4 billion and 10.4 trillion design points. Our method was able to find design points with up to 18.9% lower average memory access time while reducing total cache size by up to 74.15%, compared to a state-of-the-art noniterative method. The number of design points explored was 4× higher in our method, which is still a mere 3.6 × 10 -5 % of the entire design space, and took 6.08 h.
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2015.2445736