Early load address resolution via register tracking
Higher microprocessor frequencies accentuate the performance cost of memory accesses. This is especially noticeable in the Intel's IA32 architecture where lack of registers results in increased number of memory accesses. This paper presents novel, non-speculative technique that partially hides...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Higher microprocessor frequencies accentuate the performance cost of memory accesses. This is especially noticeable in the Intel's IA32 architecture where lack of registers results in increased number of memory accesses. This paper presents novel, non-speculative technique that partially hides the increasing load-to-use latency, by allowing the early issue of load instructions. Early load address resolution relies on register tracking to safely compute the addresses of memory references in the front-end part of the processor pipeline. Register tracking enables decode-time computation of register values by tracking simple operations of the form reg±immediate. Register tracking may be performed in any pipeline stage following instruction decode and prior to execution.
Several tracking schemes are proposed in this paper: Stack pointer tracking allows safe early resolution of stack references by keeping track of the value of the ESP register (the stack pointer). About 25% of all loads are stack loads and 95% of these loads may be resolved in the front-end. Absolute address tracking allows the early resolution of constant-address loads. Displacement-based tracking tackles all loads with addresses of the form reg±immediate by tracking the values of all general-purpose registers. This class corresponds to 82% of all loads, and about 65% of these loads can be safely resolved in the front-end pipeline.
The paper describes the tracking schemes, analyzes their performance potential in a deeply pipelined processor and discusses the integration of tracking with memory disambiguation. |
---|---|
DOI: | 10.1145/339647.339705 |