Early straggler tasks detection by recurrent neural network in a heterogeneous environment

Heterogeneity is common in parallel and distributed environments used for extensive computations such as MapReduce. Stragglers are the tasks that are running on inferior performing nodes in the cluster. Early detection of stragglers is always challenging in such environments. In the previously propo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2023-04, Vol.53 (7), p.7369-7389
Hauptverfasser: Bawankule, Kamalakant Laxman, Dewang, Rupesh Kumar, Singh, Anil Kumar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Heterogeneity is common in parallel and distributed environments used for extensive computations such as MapReduce. Stragglers are the tasks that are running on inferior performing nodes in the cluster. Early detection of stragglers is always challenging in such environments. In the previously proposed approaches, late detection of straggler tasks and estimation of time to end (TTE) for all the tasks running in a heterogeneous environment delays the entire job execution. Early straggler detection help to speculate a task at the early stages of task execution which indirectly improves the complete job execution. This article proposed early straggler detection by a recurrent neural network (ESDRNN) that collects the task and node information every three seconds from ApplicationMaster to train the RNN. It classifies the straggler tasks pretty early by RNN, between thirty to forty seconds of task execution, and transfers a list of classified tasks to an agent running on ResourceManager. RNN is a type of artificial neural network that is prevalent for processing sequential time-series data. Then, the agent predicts the TTE of these classified tasks by the Autoregressive integrated moving average (ARIMA) model. Finally, it sorts and refreshes the list with higher TTE after every ten seconds and speculates the tasks for the early completion of the MapReduce job. This proposed technique’s performance is evaluated on the HiBench benchmark suite of Hadoop’s most popular benchmark. Finally, compared with the default speculation technique and different techniques, the proposed speculation technique detects the stragglers early within 35 to 40 seconds of task execution. As a result, it decreases the job execution time by an average of 21% to 38% significantly for different workloads in a heterogeneous Hadoop cluster.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-022-03837-1