HC-Store: putting MapReduce’s foot in two camps

MapReduce is a popular framework for largescale data analysis. As data access is critical forMapReduce’s performance, some recent work has applied different storage models, such as column-store or PAX-store, to MapReduce platforms. However, the data access patterns of different queries are very diff...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Frontiers of Computer Science 2014-12, Vol.8 (6), p.859-871
Hauptverfasser:	WANG, Huiju, LI, Furong, ZHOU, Xuan, CAO, Yu, QIN, Xiongpai, CHEN, Jidong, WANG, Shan
Format:	Artikel
Sprache:	eng
Schlagworte:	column-store Computer Science cost model Data analysis Hadoop HC-store Hybrid systems MapReduce PAX-store Research Article
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	MapReduce is a popular framework for largescale data analysis. As data access is critical forMapReduce’s performance, some recent work has applied different storage models, such as column-store or PAX-store, to MapReduce platforms. However, the data access patterns of different queries are very different. No storagemodel is able to achieve the optimal performance alone. In this paper, we study how MapReduce can benefit from the presence of two different column-store models — pure column-store and PAX-store. We propose a hybrid storage system called hybrid columnstore (HC-store). Based on the characteristics of the incoming MapReduce tasks, our storage model can determine whether to access the underlying pure column-store or PAX-store.We studied the properties of the different storage models and create a cost model to decide the data access strategy at runtime. We have implemented HC-store on top of Hadoop. Our experimental results show that HC-store is able to outperform PAX-store and column-store, especially when confronted with diverse workload.
ISSN:	2095-2228 2095-2236
DOI:	10.1007/s11704-014-3376-3