QPOPSS: Query and Parallelism Optimized Space-Saving for Finding Frequent Stream Elements
The frequent elements problem, a key component in demanding stream-data analytics, involves selecting elements whose occurrence exceeds a user-specified threshold. Fast, memory-efficient $\epsilon$-approximate synopsis algorithms select all frequent elements but may overestimate them depending on $\...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The frequent elements problem, a key component in demanding stream-data
analytics, involves selecting elements whose occurrence exceeds a
user-specified threshold. Fast, memory-efficient $\epsilon$-approximate
synopsis algorithms select all frequent elements but may overestimate them
depending on $\epsilon$ (user-defined parameter). Evolving applications demand
performance only achievable by parallelization. However, algorithmic guarantees
concerning concurrent updates and queries have been overlooked. We propose
Query and Parallelism Optimized Space-Saving (QPOPSS), providing concurrency
guarantees. The design includes an implementation of the \emph{Space-Saving}
algorithm supporting fast queries, implying minimal overlap with concurrent
updates. QPOPSS integrates this with the distribution of work and fine-grained
synchronization among threads, swiftly balancing high throughput, high
accuracy, and low memory consumption. Our analysis, under various concurrency
and data distribution conditions, shows space and approximation bounds. Our
empirical evaluation relative to representative state-of-the-art methods
reveals that QPOPSS's multi-threaded throughput scales linearly while
maintaining the highest accuracy, with orders of magnitude smaller memory
footprint. |
---|---|
DOI: | 10.48550/arxiv.2409.01749 |