DoppelGanger++: Towards Fast Dependency Graph Generation for Database Replay

A database replay system (DRS) captures workloads on a production system and then replays them in a test system to test various system changes, avoiding any risk before realizing them in production. The dependency graph generation in a DRS is crucial in preserving output determinism while maximizing...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the ACM on management of data 2024-03, Vol.2 (1), p.1-26, Article 67
Hauptverfasser:	Lee, Wonseok, Ha, Jaehyun, Han, Wook-Shin, Park, Changgyoo, Park, Myunggon, Han, Juhyeng, Lee, Juchang
Format:	Artikel
Sprache:	eng
Schlagworte:	Data management systems Database administration Database performance evaluation Database utilities and tools Information systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A database replay system (DRS) captures workloads on a production system and then replays them in a test system to test various system changes, avoiding any risk before realizing them in production. The dependency graph generation in a DRS is crucial in preserving output determinism while maximizing concurrency. The state-of-the-art dependency graph generation algorithm deployed in a commercial DBMS uses a generate-and-prune strategy. It first generates a dependency graph by performing backward scans for each request in a workload. It then prunes all redundant edges using an expensive, transitive reduction algorithm. However, we notice that this generates a large dependency graph that contains many redundant edges and its worst-case time complexity is quadratic to the number of requests in a workload. In order to solve these challenging problems, we formally propose four classes of dependency graphs for DRSs. We then present a stateful single forward scan algorithm, SSFS, to generate any class of dependency graphs by performing a single scan over all requests while succinctly maintaining states. Here, states refer to information that is stored and maintained for efficient dependency graph generation. We also propose the parallel SSFS to utilize the computation power with multi-core CPUs while balancing the loads. We implemented our DRS in a leading commercial DBMS. Extensive experiments using the TPC-C, SD benchmarks, and a real-world customer workload show that our DRS significantly improves the dependency graph generation time by up to two orders of magnitude, compared to the state-of-the-art.
ISSN:	2836-6573 2836-6573
DOI:	10.1145/3639322