Using Imbalance Characteristic for Fault-Tolerant Workflow Scheduling in Cloud Systems

Resubmission and replication are two fundamental and widely recognized techniques in distributed computing systems for fault tolerance. The resubmission based strategy has an advantage in resource utilization, while the replication based strategy can reduce the task completed time in the context of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2017-12, Vol.28 (12), p.3671-3683
Hauptverfasser: Yao, Guangshun, Ding, Yongsheng, Hao, Kuangrong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Resubmission and replication are two fundamental and widely recognized techniques in distributed computing systems for fault tolerance. The resubmission based strategy has an advantage in resource utilization, while the replication based strategy can reduce the task completed time in the context of fault. However, few researches take these two techniques together for fault-tolerant workflow scheduling, especially in Cloud systems. In this paper, we present a novel fault-tolerant workflow scheduling (ICFWS) algorithm for Cloud systems by combining the aforementioned two strategies together to play their respective advantages for fault tolerance while trying to meet the soft deadline of workflow. First, it divides the soft deadline of workflow into multiple sub-deadlines for all tasks. Then, it selects a reasonable fault-tolerant strategy and reserves suitable resource for each task by taking the imbalance sub-deadlines among tasks and on-demand resource provisioning of Cloud systems into consideration. Finally, an online scheduling and reservation adjustment scheme is designed to select a suitable resource for the task with resubmission strategy and adjust the sub-deadlines as well as fault-tolerant strategies of some unexecuted tasks during the task execution process, respectively. The proposed algorithm is evaluated on both real-world and randomly generated workflows. The results demonstrate that the ICFWS outperforms some well-known approaches on corresponding metrics.
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2017.2687923