Multi-Granularity Locality-Sensitive Bloom Filter

In many applications, such as homeland security, image processing, social network, and bioinformatics, it is often required to support an approximate membership query (AMQ) to answer a question like "is an (query) object q near to at least one of the objects in the given data set Ω?" Howev...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computers 2015-12, Vol.64 (12), p.3500-3514
Hauptverfasser: Qian, Jiangbo, Zhu, Qiang, Chen, Huahui
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In many applications, such as homeland security, image processing, social network, and bioinformatics, it is often required to support an approximate membership query (AMQ) to answer a question like "is an (query) object q near to at least one of the objects in the given data set Ω?" However, existing techniques for processing AMQs require a key parameter, i.e., the distance value, to be defined in advance for the query processing. In this paper, we propose a novel filter, called multi-granularity locality-sensitive Bloom filter (MLBF), which can process AMQs with multiple distance granularities. Specifically, the MLBF is composed of two Bloom filters (BF), one is called basic multi-granularity locality-sensitive BF (BMLBF), and the other is called multi-granularity verification BF (MVBF). The BMLBF is used to store the data objects. It adopts an alignable locality-sensitive hashing (LSH) function family to support multiple granularities. The MVBF is used to reduce the false positive rate of the MLBF. The false negative rate of the MLBF is reduced by applying AND-constructions followed by an OR-construction. In addition, based on the MLBF structure, we suggest a more spaceeffective variant, called the MLBF , to further reduce space cost. Theoretical analyses for estimating false positive/negative rates of the MLBF/MLBF are given. Experiments using synthetic and real data show that the theoretical estimates are quite accurate, and the MLBF/MLBF technique can handle AMQs with low false positive and negative rates for multiple distance granularities.
ISSN:0018-9340
1557-9956
DOI:10.1109/TC.2015.2401011