Modeling Remapping Based Fault Tolerance Techniques for Chip Multiprocessor Cache with Design Space Exploration

On top of the wear-out failures and external particle interventions, voltage scaling to mitigate the power consumption in multiprocessor makes cache more vulnerable to cell failures. For the indispensable voltage reduction to prolong the battery life of handheld devices, fault tolerance techniques a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of electronic testing 2020-02, Vol.36 (1), p.59-73
Hauptverfasser: Choudhury, Avishek, Sikdar, Biplab K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:On top of the wear-out failures and external particle interventions, voltage scaling to mitigate the power consumption in multiprocessor makes cache more vulnerable to cell failures. For the indispensable voltage reduction to prolong the battery life of handheld devices, fault tolerance techniques are extremely important to ensure fault free execution in near-threshold voltage. Several fault tolerance techniques have been proposed and the remapping based techniques are found to be effective to address the issue of fault tolerance in single core systems. This work proposes an analytical model for remapping based fault tolerance techniques to evaluate the effectiveness of such schemes in multicore systems. The metrics Expected Miss Ratio in Multicore ( E M R M C ) and Expected Latency Ratio in Multicore ( E L R M C ), are introduced to characterize the behavior of remapping based techniques. The E M R M C and E L R M C are defined as the function of probability of cell failure ( P f a i l ), block size, number of cores and threads. The system is simulated in Multi2sim 5.0, a multicore CPU-GPU simulator. The values of the metrics for different configuration parameters like probability of cell failure, number of cores, number of blocks, block size and number of threads are analysed for framing the guidelines of system configuration to deliver better performance in remapping based fault tolerance. It is observed that the E M R M C is proportional to P f a i l and block size but inversely proportional to the number of cores and threads and it is not affected by the number of blocks. On the contrary, the E L R M C is inversely proportional to P f a i l and block size and proportional to the number of cores and threads. It is also observed that the E L R M C is independent of the number of cores and blocks. E M R M C is best minimized for P f a i l ≤ 1e-4, block size ≤ 64 bytes, number of cores ≥ 4 and number of threads ≥ 2. On the other hand, E L R M C is best observed for P f a i l ≤ 1e-4, block size ≥ 64 bytes, number of cores ≥ 4 and number of threads 2.
ISSN:0923-8174
1573-0727
DOI:10.1007/s10836-019-05852-6