Exploiting instruction level parallelism in processors by caching scheduled groups

Modern processors employ a large amount of hardware to dynamically detect parallelism in single-threaded programs and maintain the sequential semantics implied by these programs. The complexity of some of this hardware diminishes the gains due to parallelism because of longer clock period or increas...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Nair, Ravi, Hopkins, Martin E.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Clocks Computer systems organization > Architectures > Other architectures Computer systems organization > Architectures > Parallel architectures > Very long instruction word Computer systems organization > Architectures > Serial architectures > Complex instruction set computing Computer systems organization > Architectures > Serial architectures > Reduced instruction set computing Computer systems organization > Dependable and fault-tolerant systems and networks Dynamic scheduling Engines Frequency General and reference > Cross-computing tools and techniques > Performance Hardware Modems Networks > Network performance evaluation Out of order Permission Processor scheduling Theory of computation > Design and analysis of algorithms > Approximation algorithms analysis > Scheduling algorithms Theory of computation > Design and analysis of algorithms > Online algorithms > Online learning algorithms > Scheduling algorithms Theory of computation > Models of computation > Concurrency Theory of computation > Models of computation > Concurrency > Parallel computing models Theory of computation > Theory and algorithms for application domains > Machine learning theory > Reinforcement learning > Sequential decision making VLIW
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Modern processors employ a large amount of hardware to dynamically detect parallelism in single-threaded programs and maintain the sequential semantics implied by these programs. The complexity of some of this hardware diminishes the gains due to parallelism because of longer clock period or increased pipeline latency of the machine.In this paper we propose a processor implementation which dynamically schedules groups of instructions while executing them on a fast simple engine and caches them for repeated execution on a fast VLIW-type engine. Our experiments show that scheduling groups spanning several basic blocks and caching these scheduled groups results in significant performance gain over fill buffer approaches for a standard VLIW cache.This concept, which we call DIF (Dynamic Instruction Formatting), unifies and extends principles underlying several schemes being proposed today to reduce superscalar processor complexity. This paper examines various issues in designing such a processor and presents results of experiments using trace-driven simulation of SPECint95 benchmark programs.
ISSN:	1063-6897 0163-5964 2575-713X
DOI:	10.1145/264107.264125