A Hybrid Join Algorithm on Top of Map Reduce

Hadoop has shown great power in processing vast data in parallel. Hive, the database on Hadoop, enables more experts to process relational data by providing sql-like interface. However, Hive does not provide an efficient approach for join, a common but expensive operator in relational database. Due...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Weisong Hu, Ma, Lili, Xiaowei Liu, Hongwei Qi, Li Zha, Huaming Liao, Yuezhuo Zhang
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	auto-tuning Hadoop MapReduce partition join Semantics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Hadoop has shown great power in processing vast data in parallel. Hive, the database on Hadoop, enables more experts to process relational data by providing sql-like interface. However, Hive does not provide an efficient approach for join, a common but expensive operator in relational database. Due to the importance of join, this paper proposes a novel hybrid algorithm, HJA, which can help to automatically choose the relatively better one among several methods, divide and memory copy merge, Partition Join(PJ) and naïve Hive join. Experiments show that HJA can get best performance in most situations.
DOI:	10.1109/SKG.2011.13