Taming Tail Latency for Erasure-coded, Distributed Storage Systems
Distributed storage systems are known to be susceptible to long tails in response time. In modern online storage systems such as Bing, Facebook, and Amazon, the long tails of the service latency are of particular concern. with 99.9th percentile response times being orders of magnitude worse than the...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Distributed storage systems are known to be susceptible to long tails in
response time. In modern online storage systems such as Bing, Facebook, and
Amazon, the long tails of the service latency are of particular concern. with
99.9th percentile response times being orders of magnitude worse than the mean.
As erasure codes emerge as a popular technique to achieve high data reliability
in distributed storage while attaining space efficiency, taming tail latency
still remains an open problem due to the lack of mathematical models for
analyzing such systems. To this end, we propose a framework for quantifying and
optimizing tail latency in erasure-coded storage systems. In particular, we
derive upper bounds on tail latency in closed form for arbitrary service time
distribution and heterogeneous files. Based on the model, we formulate an
optimization problem to jointly minimize the weighted latency tail probability
of all files over the placement of files on the servers, and the choice of
servers to access the requested files. The non-convex problem is solved using
an efficient, alternating optimization algorithm. Numerical results show
significant reduction of tail latency for erasure-coded storage systems with a
realistic workload. |
---|---|
DOI: | 10.48550/arxiv.1703.08337 |