Multiple clock and voltage domains for chip multi processors

Power and thermal are major constraints for delivering compute performance in high-end CPU and are expected to be so in the future. CMP is becoming important by delivering more compute performance within the power constraints. Dynamic Voltage and Frequency Scaling (DVFS) has been studied in past wor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Rotem, Efraim, Mendelson, Avi, Ginosar, Ran, Weiser, Uri
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Power and thermal are major constraints for delivering compute performance in high-end CPU and are expected to be so in the future. CMP is becoming important by delivering more compute performance within the power constraints. Dynamic Voltage and Frequency Scaling (DVFS) has been studied in past work as a mean to increase save power and improving the overall processor's performance while meeting the total power and/or thermal constraints. For such systems, power delivery limitations are becoming a significant practical design consideration, unfortunately this aspect of the design was almost ignored by many research works. This paper explores the various possible topologies to build a high end multi-core CPU and the available policies that maximize performance within the set of physical limitations. It evaluates single and multiple voltage and frequency domains and introduces a new clustered topology, grouping several cores together. A hybrid model, using measurements of a real CPU, cycle accurate simulator and an analytical model is introduced. The results presented indicate that considering power delivery limitations diverts the conclusions when such limitations are ignored. This paper shows that a single power domain topology performs up to 30% better than multiple power domains on light-threaded workload. In the fully threaded application the results divert. Clustered topology performs well for any number of threads.
ISSN:1072-4451
DOI:10.1145/1669112.1669170