Learning-based Sketches for Frequency Estimation in Data Streams without Ground Truth
Estimating the frequency of items on the high-volume, fast data stream has been extensively studied in many areas, such as database and network measurement. Traditional sketch algorithms only allow to give very rough estimates with limited memory cost, whereas some learning-augmented algorithms have...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Estimating the frequency of items on the high-volume, fast data stream has
been extensively studied in many areas, such as database and network
measurement. Traditional sketch algorithms only allow to give very rough
estimates with limited memory cost, whereas some learning-augmented algorithms
have been proposed recently, their offline framework requires actual
frequencies that are challenging to access in general for training, and speed
is too slow for real-time processing, despite the still coarse-grained
accuracy.
To this end, we propose a more practical learning-based estimation framework
namely UCL-sketch, by following the line of equation-based sketch to estimate
per-key frequencies. In a nutshell, there are two key techniques: online
training via equivalent learning without ground truth, and highly scalable
architecture with logical estimation buckets. We implemented experiments on
both real-world and synthetic datasets. The results demonstrate that our method
greatly outperforms existing state-of-the-art sketches regarding per-key
accuracy and distribution, while preserving resource efficiency. Our code is
attached in the supplementary material, and will be made publicly available at
https://github.com/Y-debug-sys/UCL-sketch. |
---|---|
DOI: | 10.48550/arxiv.2412.03611 |