Using adaptive resource allocation to implement an elastic MapReduce framework

Summary Today, we are observing a transition of science paradigms from the computational science to data‐intensive science. With the exponential increase of input and intermediate data, more applications are developed using the MapReduce programming model, which is regarded as an appropriate program...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Software, practice & experience practice & experience, 2017-03, Vol.47 (3), p.349-360
Hauptverfasser: Zhao, Jiaqi, Xue, Changlong, Tao, Xinlin, Zhang, Shugong, Tao, Jie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Summary Today, we are observing a transition of science paradigms from the computational science to data‐intensive science. With the exponential increase of input and intermediate data, more applications are developed using the MapReduce programming model, which is regarded as an appropriate programming model for analysing large data sets. A MapReduce framework runs its applications on a cluster, where the computing capacity allocated to the applications is limited and may not fill their runtime resource demand. In this case, the Map/Reduce tasks have to wait in a queues, and the applications suffer from a poor performance. This work develops an autonomic resource manager within the Hadoop MapReduce framework. The manager is capable of getting aware of the overloading or under‐loading situations with the resources allocated to its user community. For the former, it takes an action of requesting more resources from, for example, the batch system of a High Performance Computing (HPC) cluster or Computing Clouds and integrates the additional resources, in case of acquisition, into the Hadoop MapReduce runtime. For the latter, the manager gives the free resources back to its source. We extended the existing Hadoop MapReduce resource manager to implement the proposed strategy and validated the concept on an HPC cluster with standard benchmark applications. Experimental results show a significant performance gain, for example, an up to 45% improvement in execution time for running multiple applications. Copyright © 2016 John Wiley & Sons, Ltd.
ISSN:0038-0644
1097-024X
DOI:10.1002/spe.2398