The impact of incorrectly speculated memory operations in a multithreaded architecture

The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2005-03, Vol.16 (3), p.271-285
Hauptverfasser:	Sendag, R., Ying Chen, Lilja, D.J.
Format:	Artikel
Sprache:	eng
Schlagworte:	Benchmark testing Communication networks Computer memory Delay Load mispredicted loads multithreaded architecture Pipelines Pollution Prefetching Registers Speculation Studies System performance Traffic control wrong execution wrong execution cache
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads. By continuing to execute the mispredicted load instructions even after the instruction or thread-level control speculation is known to be incorrect, the cache misses observed on the correctly executed paths can be reduced by 16 to 73 percent, with an average reduction of 45 percent. However, we also find that these extra loads can increase the amount of memory traffic and can pollute the cache. We introduce the small, fully associative wrong execution cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5 percent on the benchmark programs tested, with an average improvement of 9.7 percent, due to the reductions in the number of cache misses.
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2005.36