MULTS: A multi-cloud fault-tolerant architecture to manage transient servers in cloud computing
•An architecture to provide an efficient way to use transient servers in cloud.•Use of a scenario-optimal checkpoint to execution guarantee and reduce user costs.•Experiments used 21 million price changes collected from Amazon AWS spot instances.•Experiments created a knowledge database with approxi...
Gespeichert in:
Veröffentlicht in: | Journal of systems architecture 2019-12, Vol.101, p.101651, Article 101651 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •An architecture to provide an efficient way to use transient servers in cloud.•Use of a scenario-optimal checkpoint to execution guarantee and reduce user costs.•Experiments used 21 million price changes collected from Amazon AWS spot instances.•Experiments created a knowledge database with approximately 110 million records.•Prediction accuracy reached 92% rate demonstrating the potential of the approach.
The large-scale utilization of cloud computing resources has led to the emergence of cloud environment reliability as an important issue. In addition, cloud providers are negotiating unreliable virtual machines as a result of exploring unused resources offering them as transient servers - a lower price virtual machine service with resource revocations without user intervention. To increase the availability of transient servers, we propose a multi-cloud fault-tolerant architecture to provide a resilient environment using a scenario-based optimal checkpoint in a scheme to guarantee running processes with reduced user costs. The architecture combines a heuristic to extract information from a case-based reasoning and a statistical model to predict failure events helping to refine fault tolerance parameters. As a result, a cloud environment with better levels of reliability and reduced execution time is provided. Extensive simulations show high levels of accuracy reaching up to 92% survival prediction success rate and a gain of 74,58% of execution time reduction for long running applications. The results are promising, indicating that the proposed architecture can prevent revocation failures under realistic working conditions. |
---|---|
ISSN: | 1383-7621 1873-6165 |
DOI: | 10.1016/j.sysarc.2019.101651 |