Reliability-Aware Runtime Adaption Through a Statically Generated Task Schedule
Device scaling, increasing number of components in a single chip, varying environmental issues, and aging effects have brought severe reliability challenges that impose tight constraints on the operation of a system. To cope with these challenges, this paper proposes a reliability-aware scheduling f...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on very large scale integration (VLSI) systems 2018-01, Vol.26 (1), p.11-22 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Device scaling, increasing number of components in a single chip, varying environmental issues, and aging effects have brought severe reliability challenges that impose tight constraints on the operation of a system. To cope with these challenges, this paper proposes a reliability-aware scheduling framework that combines static and dynamic analyses to improve the overall system resiliency to different kinds of faults (i.e., intermittent, transient, and permanent). The static analysis technique employs genetic algorithms to optimize the overall system reliability by considering reliability level (RL) as an intermediate scheduling dimension and creating a task-to-RL mapping. This enables the RL-to-core mapping to be efficiently adapted at runtime according to fault rate variations, while the task-to-RL mapping can still be reused. The dynamic analysis tracks faults appearing in each core and measures the time correlation of those faults to update the RL-to-core mapping. The proposed reliability-aware framework is implemented in a state-of-the-art runtime system, Delaware Adaptive Run-Time System, so as to quantitatively show the advantages of using the overall framework in existing multicore platforms. Experimental results show that the proposed technique delivers up to 30% improvement in application execution time and up to 72% improvement in faults occurring at runtime. |
---|---|
ISSN: | 1063-8210 1557-9999 |
DOI: | 10.1109/TVLSI.2017.2753242 |