Sharp Frequency Bounds for Sample-Based Queries
In 2019 IEEE Big Data, pages 5983-5985, 2019 A data sketch algorithm scans a big data set, collecting a small amount of data -- the sketch, which can be used to statistically infer properties of the big data set. Some data sketch algorithms take a fixed-size random sample of a big data set, and use...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In 2019 IEEE Big Data, pages 5983-5985, 2019 A data sketch algorithm scans a big data set, collecting a small amount of
data -- the sketch, which can be used to statistically infer properties of the
big data set. Some data sketch algorithms take a fixed-size random sample of a
big data set, and use that sample to infer frequencies of items that meet
various criteria in the big data set. This paper shows how to statistically
infer probably approximately correct (PAC) bounds for those frequencies,
efficiently, and precisely enough that the frequency bounds are either sharp or
off by only one, which is the best possible result without exact computation. |
---|---|
DOI: | 10.48550/arxiv.2208.06753 |