FarSpot: Optimizing Monetary Cost for HPC Applications in the Cloud Spot Market

Recently, we have witnessed many HPC applications developed and hosted in the cloud, which can benefit from the elastic and diversified resources on the cloud, while on the other hand confronting high costs for executing the long-running HPC applications. Although public clouds such as Amazon EC2 of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2022-11, Vol.33 (11), p.2955-2967
Hauptverfasser: Zhou, Amelie Chi, Lao, Jianming, Ke, Zhoubin, Wang, Yi, Mao, Rui
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recently, we have witnessed many HPC applications developed and hosted in the cloud, which can benefit from the elastic and diversified resources on the cloud, while on the other hand confronting high costs for executing the long-running HPC applications. Although public clouds such as Amazon EC2 offer spot instances with dynamic and usually low prices compared to on-demand ones, the spot prices can vary significantly and sometimes can even be more expensive than on-demand prices of the same type. Previous work on reducing the monetary cost for HPC applications using spot instances focused on designing fault tolerance techniques or selecting appropriate instance types/bid prices to make good usage of the low spot prices. However, with the recent update of spot pricing model on Amazon EC2, these work may become either inefficient or invalid. In this article, we present FarSpot which is an optimization framework for HPC applications in the latest cloud spot market with the goal of minimizing application cost while ensuring performance constraints. FarSpot provides accurate long-term price prediction for a wide range of spot instance types using ensemble-based learning method. It further incorporates a cost-aware deadline assignment algorithm to distribute application deadline to each task according to spot price changes. With the assigned subdeadline of each task, FarSpot dynamically migrates tasks among spot instances to reduce execution cost. Evaluation results using real HPC benchmark show that 1) the prediction error of FarSpot is very low (below 3%), 2) FarSpot reduced the monetary cost by 32% on average compared to state-of-the-art algorithms, and 3) FarSpot satisfies the user-specified deadline constraints at all time.
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2021.3134644