Store vectors for scalable memory dependence prediction and scheduling

Allowing loads to issue out-of-order with respect to earlier unresolved store addresses is very important for extracting parallelism in large-window superscalar processors. Blindly allowing all loads to issue as soon as their addresses are ready can lead to a net performance loss due to a large numb...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Samantika Subramaniam, Loh, G.H.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Computer aided instruction Educational institutions Job shop scheduling Microarchitecture Out of order Parallel processing Pipelines Prediction algorithms Processor scheduling Read-write memory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Allowing loads to issue out-of-order with respect to earlier unresolved store addresses is very important for extracting parallelism in large-window superscalar processors. Blindly allowing all loads to issue as soon as their addresses are ready can lead to a net performance loss due to a large number of load-store ordering violations. Previous research has proposed memory dependence prediction algorithms to prevent only loads with true memory dependencies from issuing in the presence of unresolved stores. Techniques such as load-store pair identification and store sets have been very successful in achieving performance levels close to that attained by an oracle dependence predictor. These techniques tend to employ relatively complex CAM-based designs, which we believe have been obstacles to the industrial adoption of these algorithms. In this paper, we use the idea of dependency vectors from matrix schedulers for non-memory instructions, and adapt them to implement a new dependence prediction algorithm. For applications that experience frequent memory ordering violations, our "store vector" prediction algorithm delivers an 8.4% speedup over blind speculation (compared to 8.5% for perfect dependence prediction), achieves better performance than store sets (8.1%), and the store vector algorithm's matrix implementation is considerably simpler.
ISSN:	1530-0897 2378-203X
DOI:	10.1109/HPCA.2006.1598113