Tools for Predicting the Reliability of Large-Scale Storage Systems
Data-intensive applications require extreme scaling of their underlying storage systems. Such scaling, together with the fact that storage systems must be implemented in actual data centers, increases the risk of data loss from failures of underlying components. Accurate engineering requires quantit...
Gespeichert in:
Veröffentlicht in: | ACM transactions on storage 2016-08, Vol.12 (4), p.1-30 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Data-intensive applications require extreme scaling of their underlying storage systems. Such scaling, together with the fact that storage systems must be implemented in actual data centers, increases the risk of data loss from failures of underlying components. Accurate engineering requires quantitatively predicting reliability, but this remains challenging due to the need to account for extreme scale, redundancy scheme type and strength, distribution architecture, and component dependencies. This article introduces CQS
im
-R, a tool suite for predicting the reliability of large-scale storage system designs and deployments. CQS
im
-R includes (a) direct calculations based on an only-drives-fail failure model and (b) an event-based simulator for detailed prediction that handles failures of and failure dependencies among arbitrary (drive or nondrive) components. These are based on a common combinatorial framework for modeling placement strategies. The article demonstrates CQS
im
-R using models of common storage systems, including replicated and erasure coded designs. New results, such as the poor reliability scaling of spread-placed systems and a quantification of the impact of data center distribution and rack-awareness on reliability, demonstrate the usefulness and generality of the tools. Analysis and empirical studies show the tools’ soundness, performance, and scalability. |
---|---|
ISSN: | 1553-3077 1553-3093 |
DOI: | 10.1145/2911987 |