Node-Based Job Scheduling for Large Scale Simulations of Short Running Jobs

Diverse workloads such as interactive supercomputing, big data analysis, and large-scale AI algorithm development, requires a high-performance scheduler. This paper presents a novel node-based scheduling approach for large scale simulations of short running jobs on MIT SuperCloud systems, that allow...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2021-08
Hauptverfasser: Byun, Chansup, Arcand, William, Bestor, David, Bergeron, Bill, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jones, Michael, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, Kepner, Jeremy
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Diverse workloads such as interactive supercomputing, big data analysis, and large-scale AI algorithm development, requires a high-performance scheduler. This paper presents a novel node-based scheduling approach for large scale simulations of short running jobs on MIT SuperCloud systems, that allows the resources to be fully utilized for both long running batch jobs while simultaneously providing fast launch and release of large-scale short running jobs. The node-based scheduling approach has demonstrated up to 100 times faster scheduler performance that other state-of-the-art systems.
ISSN:2331-8422
DOI:10.48550/arxiv.2108.11359