Reliability Analysis of Storage Systems With Partially Repairable Devices

Modern storage devices such as hard disk drives (HDDs) and solid state drives (SSDs) have reached capacities beyond 18TB. Failure of such devices requires data recovery from parities. Given the large capacities, the recovery process may take up to a few days depending on the bandwidth and the erasur...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on device and materials reliability 2021-06, Vol.21 (2), p.267-272
1. Verfasser: Olmez, Serkay
Format: Magazinearticle
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Modern storage devices such as hard disk drives (HDDs) and solid state drives (SSDs) have reached capacities beyond 18TB. Failure of such devices requires data recovery from parities. Given the large capacities, the recovery process may take up to a few days depending on the bandwidth and the erasure coding scheme implemented. During the recovery, the system is vulnerable to data loss if additional device failures are encountered. Therefore, it is important to complete the recovery as quickly as possible. The recovery can be accelerated if the data on the failed device is only partially corrupted, and the remaining portion is still accessible. This is indeed the case for storage devices that consist of multiple physical units of recording subsystems. For example, modern HDDs have up to 18 heads, and SSDs have multiple flash chips. These subsystems may fail independently without affecting the rest of the components in the device. In this work, we study the durability of data when the device is allowed to stay online even when a number of subcomponents fail. In addition to extending the lifetime of the devices, this also allows for faster recovery of the critical data stored on the failed subsystem, which results in significant gains in the overall data durability for the storage system.
ISSN:1530-4388
1558-2574
DOI:10.1109/TDMR.2021.3077848