Tracing Distributed Algorithms Using Replay Clocks
In this thesis, we introduce replay clocks (RepCl), a novel clock infrastructure that allows us to do offline analyses of distributed computations. The replay clock structure provides a methodology to replay a computation as it happened, with the ability to represent concurrent events effectively. I...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this thesis, we introduce replay clocks (RepCl), a novel clock
infrastructure that allows us to do offline analyses of distributed
computations. The replay clock structure provides a methodology to replay a
computation as it happened, with the ability to represent concurrent events
effectively. It builds on the structures introduced by vector clocks (VC) and
the Hybrid Logical Clock (HLC), combining their infrastructures to provide
efficient replay. With such a clock, a user can replay a computation whilst
considering multiple paths of executions, and check for constraint violations
and properties that potential pathways could take in the presence of concurrent
events. Specifically, if event e must occur before f then the replay clock must
ensure that e is replayed before f. On the other hand, if e and f could occur
in any order, replay should not force an order between them. We demonstrate
that RepCl can be implemented with less than four integers for 64 processes for
various system parameters if clocks are synchronized within 1ms. Furthermore,
the overhead of RepCl (for computing timestamps and message size) is
proportional to the size of the clock. Using simulations in a custom
distributed system and NS-3, a state-of-the-art network simulator, we identify
the expected overhead of RepCl. We also identify how a user can then identify
feasibility region for RepCl, where unabridged replay is possible. Using the
RepCl, we provide a tracer for distributed computations, that allows any
computation using the RepCl to be replayed efficiently. The visualization
allows users to analyze specific properties and constraints in an online
fashion, with the ability to consider concurrent paths independently. The
visualization provides per-process views and an overarching view of the whole
computation based on the time recorded by the RepCl for each event. |
---|---|
DOI: | 10.48550/arxiv.2407.00069 |