The Memory Challenge in Reduce Phase of MapReduce Applications

MapReduce has become a popular paradigm for Big Data processing. Each MapReduce Application has two phases: Map and Reduce. Each phase consist of several tasks in a defaulted sequence of processes. It is common place to determine the number of Map tasks equal to the number of data blocks in the inpu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on big data 2016-12, Vol.2 (4), p.380-386
Hauptverfasser:	Nabavinejad, Seyed Morteza, Goudarzi, Maziar, Mozaffari, Shirin
Format:	Artikel
Sprache:	eng
Schlagworte:	Benchmark testing Big data big data processing Degradation Hadoop Indexes MapReduce Memory management memory-awareness Random access memory Tuning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	MapReduce has become a popular paradigm for Big Data processing. Each MapReduce Application has two phases: Map and Reduce. Each phase consist of several tasks in a defaulted sequence of processes. It is common place to determine the number of Map tasks equal to the number of data blocks in the input data. However, there is no specific rule for determining the number of Reduce tasks based on the amount of intermediate data generated by Map tasks or the specifications of machines that execute the tasks. Since the Reduce tasks bring the data into memory for processing, this may lead to inefficient execution of application and even application failure because of memory shortage or temporary consumption. In this work, we first evaluate this challenge and show its problematic significance. To address this challenge, we propose a Mnemonic approach. Mnemonic leverages a profiling mechanism to detect the application behavior regarding intermediate data generation. It first decides the amount of memory to be dedicated to each Reduce slot. Then it determines the number of Reduce tasks based on the gathered information through profiling and the decided size of memory for Reduce slots. Experimental results using PUMA benchmark suit indicates that our proposed memory-aware approach can 1) completely remove the likelihood of application failure due to out of memory error and 2) decrease the execution time of Reduce phase up to 58.27, 79.36, and 88.79 percent compared with Memory Oblivious, Fine Grain 1, and Fine Grain 2 approaches, respectively.
ISSN:	2332-7790 2372-2096
DOI:	10.1109/TBDATA.2016.2607756