Performance evaluation on data reconciliation algorithm in distributed system

This research came from a school-enterprise cooperation program, which aims to improve data reconciliation efficiency between two large-scale data sources. This paper mainly presents three typical algorithms: standard Bloom filter (BF), counting Bloom filter (CBF) and Invertible Bloom filter (IBF)....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Xin Wang, Hongming Zhu, Qin Liu, Xiaowen Yang, Jiakai Xiao
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This research came from a school-enterprise cooperation program, which aims to improve data reconciliation efficiency between two large-scale data sources. This paper mainly presents three typical algorithms: standard Bloom filter (BF), counting Bloom filter (CBF) and Invertible Bloom filter (IBF). With the purpose of evaluating their performance, mainly on runtime and accuracy rate, a series of experiments were designed and applied to both a small-scale and a large-scale distributed system. These algorithms are compared based on one traditional query method Inner Join (IJ). And the result shows: under the MapReduce computing framework, Inner Join, followed by BF closely, has the best performance; large-scale distributed system can evidently improve the performance on dealing with large-scale data.
ISSN:2376-5933
2376-595X
DOI:10.1109/CCIS.2012.6664432