Historical data based approach to mitigate stragglers from the Reduce phase of MapReduce in a heterogeneous Hadoop cluster
Hadoop MapReduce processes data on the cluster of commodity hardware (node) in two phases using Map and Reduce tasks. Yet another resource negotiator (YARN), a dynamic resource manager, allocates resources for Map tasks by preserving the data locality. In contrast, it allocates resources to schedule...
Gespeichert in:
Veröffentlicht in: | Cluster computing 2022-10, Vol.25 (5), p.3193-3211 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Hadoop MapReduce processes data on the cluster of commodity hardware (node) in two phases using Map and Reduce tasks. Yet another resource negotiator (YARN), a dynamic resource manager, allocates resources for Map tasks by preserving the data locality. In contrast, it allocates resources to schedule the Reduce tasks on any node in the cluster. The policy’s performance is better in a homogeneous environment, where the nodes’ computing capabilities are identical. However, its performance degrades in a heterogeneous environment when it allocates the containers for scheduling the Reduce tasks on any node that slowdowns the Reduce tasks execution and leads to computational skew. To mitigate the computational skew from the Reduce phase of MapReduce, we proposed the Historical data based Reduce tasks scheduling (HDRTS) technique. The technique has two algorithms: The first algorithm finds node average response time (NART) of each node by interpreting the job history information. The second algorithm allocates the resource on the faster processing node (FPN) to schedule the Reduce tasks. To evaluate the policy’s performance, we have used a very popular benchmark, i.e., the HiBench benchmark suite. Finally, compared with Hadoop’s default policy and several other policies, the proposed HDRTS policy reduces the Reduce tasks execution time for reduce-input-heavy jobs by nearly 25% to 37% significantly. Finally, it mitigates the computational skew and the stragglers from Reduce phase of MapReduce in the heterogeneous environments. |
---|---|
ISSN: | 1386-7857 1573-7543 |
DOI: | 10.1007/s10586-021-03530-x |