Sampling and Reconstruction Using Bloom Filters

In this paper, we address the problem of sampling from a set and reconstructing a set stored as a Bloom filter. To the best of our knowledge our work is the first to address this question. We introduce a novel hierarchical data structure called BloomSampleTree that helps us design efficient algorith...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2018-07, Vol.30 (7), p.1324-1337
Hauptverfasser:	Sengupta, Neha, Bagchi, Amitabha, Bedathur, Srikanta, Ramanath, Maya
Format:	Artikel
Sprache:	eng
Schlagworte:	Arrays Bloom filters Data structures Dictionaries Electronic mail Hash based algorithms Indexes large sets Reconstruction sampling Sampling methods Structural hierarchy Twitter
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we address the problem of sampling from a set and reconstructing a set stored as a Bloom filter. To the best of our knowledge our work is the first to address this question. We introduce a novel hierarchical data structure called BloomSampleTree that helps us design efficient algorithms to extract an almost uniform sample from the set stored in a Bloom filter and also allows us to reconstruct the set efficiently. In the case where the hash functions used in the Bloom filter implementation are partially invertible, in the sense that it is easy to calculate the set of elements that map to a particular hash value, we propose a second, more space-efficient method called HashInvert for the reconstruction. We study the properties of these two methods both analytically as well as experimentally. We provide bounds on run times for both methods and sample quality for the BloomSampleTree based algorithm, and show through an extensive experimental evaluation that our methods are efficient and effective.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2017.2785803