Optimal Generalized H-Tree Topology and Buffering for High-Performance and Low-Power Clock Distribution

Clock power, skew and maximum latency are three key metrics for clock distribution in low-power and high-performance designs. An H-tree offers minimum clock skew and good robustness against variations, but at the cost of large wirelength and clock power. On the other hand, a "fishbone" clo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computer-aided design of integrated circuits and systems 2020-02, Vol.39 (2), p.478-491
Hauptverfasser: Han, Kwangsoo, Kahng, Andrew B., Li, Jiajia
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Clock power, skew and maximum latency are three key metrics for clock distribution in low-power and high-performance designs. An H-tree offers minimum clock skew and good robustness against variations, but at the cost of large wirelength and clock power. On the other hand, a "fishbone" clock network with spine-ribs structures has smaller wirelength, latency and clock power, but larger skew, as compared to an H-tree. No previous work enables systematic exploration of the regime between H-tree and spine to achieve an optimal tradeoff among clock power, skew, and latency. In this paper, we study the concept of a generalized H-tree (GH-tree)-a topologically balanced tree with an arbitrary sequence of branching factors-and propose a dynamic programming-based method to determine optimal clock power, skew, and latency, in the space of GH-tree solutions. Our method co-optimizes clock tree topology and buffering along branches according to fitted electrical models. We further propose a balanced K-means clustering and a linear programming (LP)-guided buffer placement approach to embed the GH-tree with respect to a given sink placement. We validate our solutions in commercial clock tree synthesis (CTS) tool flows, in a commercial foundry's 28LP technology. The results show up to 30% clock power reduction while achieving similar skew and maximum latency as CTS solutions from recent versions of leading commercial place-and-route tools. Our proposed approach also achieves up to 56% clock power reduction while achieving similar skew and maximum latency as compared to CTS solutions from a state-of-the-art academic tool.
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2018.2889756