Online workflow management and performance analysis with Stampede

Scientific workflows are an enabler of complex scientific analyses. They provide both a portable representation and a foundation upon which results can be validated and shared. Large-scale scientific workflows are executed on equally complex parallel and distributed resources, where many things can...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Gunter, D., Deelman, E., Samak, T., Brooks, C. H., Goode, M., Juve, G., Mehta, G., Moraes, P., Silva, F., Swany, M., Vahi, K.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Scientific workflows are an enabler of complex scientific analyses. They provide both a portable representation and a foundation upon which results can be validated and shared. Large-scale scientific workflows are executed on equally complex parallel and distributed resources, where many things can fail. Application scientists need to track the status of their workflows in real time, detect execution anomalies automatically, and perform troubleshooting - without logging into remote nodes or searching through thousands of log files. As part of the NSF Stampede project, we have developed an infrastructure to answer these needs. The infrastructure captures application-level logs and resource information, normalizes these to standard representations, and stores these logs in a centralized general-purpose schema. Higher-level tools mine the logs in real time to determine current status, predict failures, and detect anomalous performance.
ISSN:2165-9605