Variability Mitigation in Nanometer CMOS Integrated Systems: A Survey of Techniques From Circuits to Software
Variation in performance and power across manufactured parts and their operating conditions is an accepted reality in modern microelectronic manufacturing processes with geometries in nanometer scales. This article surveys challenges and opportunities in identifying variations, their effects and met...
Gespeichert in:
Veröffentlicht in: | Proceedings of the IEEE 2016-07, Vol.104 (7), p.1410-1448 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Variation in performance and power across manufactured parts and their operating conditions is an accepted reality in modern microelectronic manufacturing processes with geometries in nanometer scales. This article surveys challenges and opportunities in identifying variations, their effects and methods to combat these variations for improved microelectronic devices. We focus on computing devices and their design at various levels to combat variability. First, we provide a review of key concepts with particular emphasis on timing errors caused by various variability sources. We consider methods to predict and prevent, detect and correct, and finally conditions under which such errors can be accepted; we also consider their implications on cost, performance and quality. We provide a comparative evaluation of methods for deployment across various layers of the system from circuits, architecture, to application software. These can be combined in various ways to achieve specific goals related to observability and controllability of the variability effects, providing means to achieve cross-layer or hybrid resilience. We then provide examples of real world resilient single-core and parallel architectures. We find that parallel architectures and parallelism in general provide the best means to combat and exploit variability to design resilient and efficient systems. Using programmable accelerator architectures such as clustered processing elements and GP-GPUs, we show how system designers can coordinate propagation of timing error information and its effects along with new techniques for memoization (i.e., spatial or temporal reuse of computation). This discussion naturally leads to use of these techniques into emerging area of "approximate computing," and how these can be used in building resilient and efficient computing systems. We conclude with an outlook for the emerging field. |
---|---|
ISSN: | 0018-9219 1558-2256 |
DOI: | 10.1109/JPROC.2016.2518864 |