Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution
Load instructions often limit instruction-level parallelism (ILP) in modern processors due to data and resource dependences they cause. Prior techniques like Load Value Prediction (LVP) and Memory Renaming (MRN) mitigate load data dependence by predicting the data value of a load instruction. Howeve...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Load instructions often limit instruction-level parallelism (ILP) in modern
processors due to data and resource dependences they cause. Prior techniques
like Load Value Prediction (LVP) and Memory Renaming (MRN) mitigate load data
dependence by predicting the data value of a load instruction. However, they
fail to mitigate load resource dependence as the predicted load instruction
gets executed nonetheless.
Our goal in this work is to improve ILP by mitigating both load data
dependence and resource dependence. To this end, we propose a
purely-microarchitectural technique called Constable, that safely eliminates
the execution of load instructions. Constable dynamically identifies load
instructions that have repeatedly fetched the same data from the same load
address. We call such loads likely-stable. For every likely-stable load,
Constable (1) tracks modifications to its source architectural registers and
memory location via lightweight hardware structures, and (2) eliminates the
execution of subsequent instances of the load instruction until there is a
write to its source register or a store or snoop request to its load address.
Our extensive evaluation using a wide variety of 90 workloads shows that
Constable improves performance by 5.1% while reducing the core dynamic power
consumption by 3.4% on average over a strong baseline system that implements
MRN and other dynamic instruction optimizations (e.g., move and zero
elimination, constant and branch folding). In presence of 2-way simultaneous
multithreading (SMT), Constable's performance improvement increases to 8.8%
over the baseline system. When combined with a state-of-the-art load value
predictor (EVES), Constable provides an additional 3.7% and 7.8% average
performance benefit over the load value predictor alone, in the baseline system
without and with 2-way SMT, respectively. |
---|---|
DOI: | 10.48550/arxiv.2406.18786 |