Tracing Distributed Algorithms Using Replay Clocks

In this thesis, we introduce replay clocks (RepCl), a novel clock infrastructure that allows us to do offline analyses of distributed computations. The replay clock structure provides a methodology to replay a computation as it happened, with the ability to represent concurrent events effectively. I...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Lagwankar, Ishaan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this thesis, we introduce replay clocks (RepCl), a novel clock infrastructure that allows us to do offline analyses of distributed computations. The replay clock structure provides a methodology to replay a computation as it happened, with the ability to represent concurrent events effectively. It builds on the structures introduced by vector clocks (VC) and the Hybrid Logical Clock (HLC), combining their infrastructures to provide efficient replay. With such a clock, a user can replay a computation whilst considering multiple paths of executions, and check for constraint violations and properties that potential pathways could take in the presence of concurrent events. Specifically, if event e must occur before f then the replay clock must ensure that e is replayed before f. On the other hand, if e and f could occur in any order, replay should not force an order between them. We demonstrate that RepCl can be implemented with less than four integers for 64 processes for various system parameters if clocks are synchronized within 1ms. Furthermore, the overhead of RepCl (for computing timestamps and message size) is proportional to the size of the clock. Using simulations in a custom distributed system and NS-3, a state-of-the-art network simulator, we identify the expected overhead of RepCl. We also identify how a user can then identify feasibility region for RepCl, where unabridged replay is possible. Using the RepCl, we provide a tracer for distributed computations, that allows any computation using the RepCl to be replayed efficiently. The visualization allows users to analyze specific properties and constraints in an online fashion, with the ability to consider concurrent paths independently. The visualization provides per-process views and an overarching view of the whole computation based on the time recorded by the RepCl for each event.
DOI:10.48550/arxiv.2407.00069