Cache performance in vector supercomputers

Traditional supercomputers use a flat multi-bank SRAM memory organization to supply high bandwidth at low latency. Most other computers use a hierarchical organization with a small SRAM cache and slower, cheaper DRAM for main memory. Such systems rely heavily on data locality for achieving optimum p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Kontothanassis, L. I., Sugumar, R. A., Faanes, G. J., Smith, J. E., Scott, M. L.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Computer systems organization Computer systems organization > Dependable and fault-tolerant systems and networks General and reference General and reference > Cross-computing tools and techniques General and reference > Cross-computing tools and techniques > Performance Hardware Hardware > Integrated circuits Hardware > Integrated circuits > Semiconductor memory Hardware > Integrated circuits > Semiconductor memory > Dynamic memory Networks Networks > Network performance evaluation Social and professional topics Social and professional topics > Professional topics Social and professional topics > Professional topics > Computing profession Social and professional topics > Professional topics > Computing profession > Testing, certification and licensing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Traditional supercomputers use a flat multi-bank SRAM memory organization to supply high bandwidth at low latency. Most other computers use a hierarchical organization with a small SRAM cache and slower, cheaper DRAM for main memory. Such systems rely heavily on data locality for achieving optimum performance. This paper evaluates cache-based memory systems for vector supercomputers. We develop a simulation model for a cache-based version of the Cray Research C90 and use the NAS parallel benchmarks to provide a large scale workload. We show that while caches reduce memory traffic and improve the performance of plain DRAM memory, they still lag behind cacheless SRAM. We identify the performance bottle-necks in DRAM-based memory systems and quantify their contribution to program performance degradation. We find the data fetch strategy to be a significant parameter affecting performance, evaluate the performance of several fetch policies, and show that small fetch sizes improve performance by maximizing the use of available memory bandwidth.
DOI:	10.5555/602770.602815