Meta's next-generation realtime monitoring and analytics platform

Unlike traditional database systems where data and system availability are tied together, there is a wide class of systems targeting realtime monitoring and analytics over structured logs where these properties can be decoupled. In these systems, responsiveness and freshness of data are often more i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2022-08, Vol.15 (12), p.3522-3534
Hauptverfasser: Harizopoulos, Stavros, Hopper, Taylor, Mo, Morton, Chandrasekaran, Shyam Sundar, Chen, Tongguang, Cui, Yan, Ganesh, Nandini, Helmling, Gary, Pham, Hieu, Wong, Sebastian
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Unlike traditional database systems where data and system availability are tied together, there is a wide class of systems targeting realtime monitoring and analytics over structured logs where these properties can be decoupled. In these systems, responsiveness and freshness of data are often more important than perfectly complete answers. One such system is Meta's Scuba [2]. Historically, Scuba has favored system availability along with speed and freshness of results over data completeness and durability. While these choices allowed Scuba to grow from terabyte scale to petabyte scale and continue onboarding a variety of use cases, they also came at an operational cost of dealing with incomplete data and managing data loss. In this paper, we present the next generation of Scuba's architecture, codenamed Kraken , which decouples storage management from the query serving system and introduces a single, durable source of truth. This enables tangible improvements to system fault tolerance and query performance while still respecting tolerable bounds of client observed data freshness. We also describe the journey of how we deployed Kraken into full production as we gradually turned off the older system with no user-visible down time.
ISSN:2150-8097
2150-8097
DOI:10.14778/3554821.3554841