Cloudy Forecast: How Predictable is Communication Latency in the Cloud?
Many systems and services rely on timing assumptions for performance and availability to perform critical aspects of their operation, such as various timeouts for failure detectors or optimizations to concurrency control mechanisms. Many such assumptions rely on the ability of different components t...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Many systems and services rely on timing assumptions for performance and
availability to perform critical aspects of their operation, such as various
timeouts for failure detectors or optimizations to concurrency control
mechanisms. Many such assumptions rely on the ability of different components
to communicate on time -- a delay in communication may trigger the failure
detector or cause the system to enter a less-optimized execution mode.
Unfortunately, these timing assumptions are often set with little regard to
actual communication guarantees of the underlying infrastructure -- in
particular, the variability of communication delays between processes in
different nodes/servers. The higher communication variability holds especially
true for systems deployed in the public cloud since the cloud is a utility
shared by many users and organizations, making it prone to higher performance
variance due to noisy neighbor syndrome. In this work, we present Cloud Latency
Tester (CLT), a simple tool that can help measure the variability of
communication delays between nodes to help engineers set proper values for
their timing assumptions. We also provide our observational analysis of running
CLT in three major cloud providers and share the lessons we learned. |
---|---|
DOI: | 10.48550/arxiv.2309.13169 |