Watermarks in stream processing systems: semantics and comparative analysis of Apache Flink and Google cloud dataflow
Streaming data processing is an exercise in taming disorder: from oftentimes huge torrents of information, we hope to extract powerful and timely analyses. But when dealing with streaming data, the unbounded and temporally disordered nature of real-world streams introduces a critical challenge: how...
Gespeichert in:
Veröffentlicht in: | Proceedings of the VLDB Endowment 2021-09, Vol.14 (12), p.3135-3147 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Streaming data processing is an exercise in taming disorder: from oftentimes huge torrents of information, we hope to extract powerful and timely analyses. But when dealing with streaming data, the unbounded and temporally disordered nature of real-world streams introduces a critical challenge: how does one reason about the completeness of a stream that never ends? In this paper, we present a comprehensive definition and analysis of
watermarks
, a key tool for reasoning about temporal completeness in infinite streams.
First, we describe what watermarks are and why they are important, highlighting how they address a suite of stream processing needs that are poorly served by eventually-consistent approaches:
• Computing a
single
correct answer, as in notifications.
• Reasoning about a
lack
of data, as in dip detection.
• Performing
non-incremental
processing over temporal subsets of an infinite stream, as in statistical anomaly detection with cubic spline models.
• Safely and punctually
garbage collecting
obsolete inputs and intermediate state.
• Surfacing a reliable signal of overall
pipeline health
.
Second, we describe, evaluate, and compare the semantically equivalent, but starkly different, watermark implementations in two modern stream processing engines: Apache Flink and Google Cloud Dataflow. |
---|---|
ISSN: | 2150-8097 2150-8097 |
DOI: | 10.14778/3476311.3476389 |