End-to-end online performance data capture and analysis for scientific workflows

With the increased prevalence of employing workflows for scientific computing and a push towards exascale computing, it has become paramount that we are able to analyze characteristics of scientific applications to better understand their impact on the underlying infrastructure and vice-versa. Such...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Future generation computer systems 2020-12, Vol.117
Hauptverfasser: Papadimitriou, George, Wang, Cong, Vahi, Karan, da Silva, Rafael Ferreira, Mandal, Anirban, Liu, Zhengchun, Mayani, Rajiv, Rynge, Mats, Kiran, Mariam, Lynch, Vickie E., Kettimuthu, Rajkumar, Deelman, Ewa, Vetter, Jeffrey S., Foster, Ian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the increased prevalence of employing workflows for scientific computing and a push towards exascale computing, it has become paramount that we are able to analyze characteristics of scientific applications to better understand their impact on the underlying infrastructure and vice-versa. Such analysis can help drive the design, development, and optimization of these next generation systems and solutions. In this paper, we present the architecture, integrated with existing well-established and newly developed tools, to collect online performance statistics of workflow executions from various, heterogeneous sources and publish them in a distributed database (Elasticsearch). Using this architecture, we are able to correlate online workflow performance data, with data from the underlying infrastructure, and present them in a useful and intuitive way via an online dashboard. We have validated our approach by executing two classes of real-world workflows, both under normal and anomalous conditions. The first is an I/O-intensive genome analysis workflow; the second, a CPUand memory-intensive material science workflow. Based on the data collected in Elasticsearch, we are able to demonstrate that we can correctly identify anomalies that we injected. The resulting end-to-end data collection of workflow performance data is an important resource of training data for automated machine learning analysis.
ISSN:0167-739X
1872-7115