Application-Specific Cross-Layer Optimization Based on Predictive Variable-Latency VLSI Design

Traditional synchronous VLSI design requires that all computations in a logic stage complete in one clock cycle. This leads to increasingly pessimistic design as technology scaling introduces increasingly significant parametric variations that result in an increasing performance variability. Alterna...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM journal on emerging technologies in computing systems 2015-09, Vol.12 (3), p.1-19
Hauptverfasser:	De, Vivek K., Kahng, Andrew B., Karnik, Tanay, Liu, Bao, Maleki, Milad, Wang, Lu
Format:	Artikel
Sprache:	eng
Schlagworte:	Clocks Computation Design engineering Integrated circuits Logic Methodology Optimization Very large scale integration
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Traditional synchronous VLSI design requires that all computations in a logic stage complete in one clock cycle. This leads to increasingly pessimistic design as technology scaling introduces increasingly significant parametric variations that result in an increasing performance variability. Alternatively, by allowing computations in a logic stage to complete in a variable number of clock cycles, variable-latency design provides relaxed timing constraints for average performance, area, and power consumption optimization. In this article, we present improved variable-latency design techniques including: (1) a generic minimum-intrusion variable-latency VLSI design paradigm, (2) a signal probability-based approximate prediction logic construction method for minimum misprediction rate at minimum cost, and (3) an application-specific cross-layer analysis methodology. Our experiments show that the proposed variable-latency design methodology on average reduces the computation latency by 26.80%(14.65%) at cost of 0.08%(3.4%) area and 0.4%(2.2%) energy consumption increase for the interger (floating point) unit of an open-source SPARC V8 processor LEON2 synthesized with a clock-cycle time between 1.97ns(3.49ns) and 5.96ns(13.74ns) based on the 45nm Nangate open cell library, while an automotive application-specific design further achieves an average latency reduction of 41.8%.
ISSN:	1550-4832 1550-4840
DOI:	10.1145/2746341