On-chip traffic regulation to reduce coherence protocol cost on a microthreaded many-core architecture with distributed caches

When hardware cache coherence scales to many cores on chip, over saturated traffic of the shared memory system may offset the benefit from massive hardware concurrency. In this article, we investigate the cost of a write-update protocol in terms of on-chip memory network traffic and its adverse effe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on embedded computing systems 2014-03, Vol.13 (3s), p.1-21
Hauptverfasser: Yang, Qiang, Fu, Jian, Poss, Raphael, Jesshope, Chris
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:When hardware cache coherence scales to many cores on chip, over saturated traffic of the shared memory system may offset the benefit from massive hardware concurrency. In this article, we investigate the cost of a write-update protocol in terms of on-chip memory network traffic and its adverse effects on the system performance based on a multithreaded many-core architecture with distributed caches. We discuss possible software and hardware solutions to alleviate the network pressure. We find that in the context of massive concurrency, by introducing a write-merging buffer with 0.46% area overhead to each core, applications with good locality and concurrency are boosted up by 18.74% in performance on average. Other applications also benefit from this addition and even achieve a throughput increase of 5.93%. In addition, this improvement indicates that higher levels of concurrency per core can be exploited without impacting performance, thus tolerating latency better and giving higher processor efficiencies compared to other solutions.
ISSN:1539-9087
1558-3465
DOI:10.1145/2567931