Stochastic distributed data stream partitioning using task locality: design, implementation, and optimization
Distributed stream processing engines ( DSPEs ) provide stream partitioning methods for distributing messages to tasks deployed in the distributed environment for real-time stream processing. Among these methods, the original locality-aware stream partitioning ( LSP ) is a binary LSP that sends mess...
Gespeichert in:
Veröffentlicht in: | The Journal of supercomputing 2021, Vol.77 (10), p.11353-11389 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Distributed stream processing engines
(
DSPEs
) provide stream partitioning methods for distributing messages to tasks deployed in the distributed environment for real-time stream processing. Among these methods, the original
locality-aware stream partitioning
(
LSP
) is a
binary LSP
that sends messages only to downstreams on the same node as upstreams. The binary LSP degrades performance at general configurations because it focuses only on task locality and does not consider downstream status like distributed batch processing engines. In this paper, we propose a
Stochastic LSP
(
SLSP
) method that considers not only task locality but also downstream status by computing stream partitioning probability based on the round-trip time to downstreams. We also present coarse-grained and fine-grained methods for probing downstreams at node-level and process-level, respectively. Then, we optimize our SLSP using a weighted closeness to prioritize the partitioning probabilities and a parallel thread model to process each stage of the SLSP in parallel. Finally, we implement the SLSP in Apache Storm, a representative DSPE, and empirically evaluate it with the binary LSP. Experimental results show that our SLSP greatly reduces latency by up to 208% while maintaining a similar throughput compared to the binary LSP at general configurations. These results indicate that our SLSP performs the optimized stream partitioning by reflecting downstream status as well as task locality. |
---|---|
ISSN: | 0920-8542 1573-0484 |
DOI: | 10.1007/s11227-021-03725-4 |