The Storage Versus Repair-Bandwidth Trade-off for Clustered Storage Systems

We study a generalization of the setting of regenerating codes, motivated by applications to storage systems consisting of clusters of storage nodes. There are n clusters in total, with m nodes per cluster. A data file is coded and stored across the mn nodes, with each node storing \alpha sy...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on information theory 2018-08, Vol.64 (8), p.5783-5805
Hauptverfasser: Prakash, N., Abdrashitov, Vitaly, Medard, Muriel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We study a generalization of the setting of regenerating codes, motivated by applications to storage systems consisting of clusters of storage nodes. There are n clusters in total, with m nodes per cluster. A data file is coded and stored across the mn nodes, with each node storing \alpha symbols. For availability of data, we require that the file be retrievable by downloading the entire content from any subset of k clusters. Nodes represent entities that can fail. We distinguish between intra-cluster and inter-cluster bandwidth (BW) costs during node repair. Node-repair in a cluster is accomplished by downloading \beta symbols each from any set of d other clusters, dubbed remote helper clusters, and also up to \alpha symbols each from any set of \ell surviving nodes, dubbed local helper nodes, in the host cluster. We first identify the optimal trade-off between storage-overhead and inter-cluster repair-bandwidth under functional repair, and also present optimal exact-repair code constructions for a class of parameters. The new trade-off is strictly better than what is achievable via space-sharing existing coding solutions, whenever \ell > 0 . We then obtain sharp lower bounds on the necessary intra-cluster repair BW to achieve optimal trade-off. Under functional repair, random linear network codes (RLNCs) simultaneously optimize usage of both inter- and intra-cluster repair BW; simulation results based on RLNCs suggest optimality of the bounds on intra-cluster repair-bandwidth. Our bounds reveal the interesting fact that, while it is beneficial to increase the number of local helper nodes \ell in order to improve the storage-vs-inter-cluster-repair-BW trade-off, increa
ISSN:0018-9448
1557-9654
DOI:10.1109/TIT.2018.2806342