Reducing operand communication overhead using instruction clustering for multimedia applications

As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low latency operand transport techniques are needed. This paper propos...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Hongkyu Kim, Wills, D.S., Wills, L.M.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low latency operand transport techniques are needed. This paper proposes and evaluates lower cost mechanisms than traditional bypass networks, exploiting regular operand distribution patterns in multimedia applications. To reduce latency associated with operand movement within a datapath, our mechanism, called dynamic instruction clustering, groups chains of dependent instructions within a basic block at runtime, identifies intermediate value transportation, and schedules it on networked ALUs which are connected by a local dedicated network. By converting global communication into local, the transport latency can be minimized and the critical path of the application code can be executed in consecutive, shortened cycles, resulting in improved performance. We demonstrated that 28% and 30% of total dependence edges residing in the instruction window can be localized on 8 and 16-way machines, respectively. Our results show that the overall performance gains over a wide range of multimedia applications are 16% for 8-way and 35% for 16-way on average.
DOI:10.1109/ISM.2005.95