Introducing a Performance Model for Bandwidth-Limited Loop Kernels
We present a performance model for bandwidth limited loop kernels which is founded on the analysis of modern cache based microarchitectures. This model allows an accurate performance prediction and evaluation for existing instruction codes. It provides an in-depth understanding of how performance fo...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present a performance model for bandwidth limited loop kernels which is
founded on the analysis of modern cache based microarchitectures. This model
allows an accurate performance prediction and evaluation for existing
instruction codes. It provides an in-depth understanding of how performance for
different memory hierarchy levels is made up. The performance of raw memory
load, store and copy operations and a stream vector triad are analyzed and
benchmarked on three modern x86-type quad-core architectures in order to
demonstrate the capabilities of the model. |
---|---|
DOI: | 10.48550/arxiv.0905.0792 |