Exploring Multilevel Cache Hierarchies in Application Specific MPSoCs
Multiprocessor systems make use of multilevel cache hierarchies to improve overall memory access speed. Embedded systems typically use configurable processors, where the caches in the system can be customized for a given application or a set of applications. Finding the optimal or a near-optimal set...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on computer-aided design of integrated circuits and systems 2015-12, Vol.34 (12), p.1991-2003 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multiprocessor systems make use of multilevel cache hierarchies to improve overall memory access speed. Embedded systems typically use configurable processors, where the caches in the system can be customized for a given application or a set of applications. Finding the optimal or a near-optimal set size, block size, and associativity of each of the caches in a multilevel cache hierarchy is a challenging task due to the presence of billions or even trillions of design points. This paper presents an iterative exploration method to find suitable configurations for all the caches in the hierarchy of an application specific multiprocessor system-on-chip, to improve memory access speed. We propose an algorithm and combine it with the use of specialized hardware for parallel cache simulation to enable multiple back-and-forth iterations through the cache levels. In every iteration, our algorithm explores selected portions of the entire design space to quickly converge upon the final design point. We demonstrate our methodology on two- and three-level cache hierarchies with private and shared caches in a quad-core system, respectively, consisting of 5.4 billion and 10.4 trillion design points. Our method was able to find design points with up to 18.9% lower average memory access time while reducing total cache size by up to 74.15%, compared to a state-of-the-art noniterative method. The number of design points explored was 4× higher in our method, which is still a mere 3.6 × 10 -5 % of the entire design space, and took 6.08 h. |
---|---|
ISSN: | 0278-0070 1937-4151 |
DOI: | 10.1109/TCAD.2015.2445736 |