Cloudy Forecast: How Predictable is Communication Latency in the Cloud?

Many systems and services rely on timing assumptions for performance and availability to perform critical aspects of their operation, such as various timeouts for failure detectors or optimizations to concurrency control mechanisms. Many such assumptions rely on the ability of different components t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Hilyard, Owen, Cui, Bocheng, Webster, Marielle, Muralikrishna, Abishek Bangalore, Charapko, Aleksey
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Many systems and services rely on timing assumptions for performance and availability to perform critical aspects of their operation, such as various timeouts for failure detectors or optimizations to concurrency control mechanisms. Many such assumptions rely on the ability of different components to communicate on time -- a delay in communication may trigger the failure detector or cause the system to enter a less-optimized execution mode. Unfortunately, these timing assumptions are often set with little regard to actual communication guarantees of the underlying infrastructure -- in particular, the variability of communication delays between processes in different nodes/servers. The higher communication variability holds especially true for systems deployed in the public cloud since the cloud is a utility shared by many users and organizations, making it prone to higher performance variance due to noisy neighbor syndrome. In this work, we present Cloud Latency Tester (CLT), a simple tool that can help measure the variability of communication delays between nodes to help engineers set proper values for their timing assumptions. We also provide our observational analysis of running CLT in three major cloud providers and share the lessons we learned.
DOI:10.48550/arxiv.2309.13169