An analytical performance model of generalized hierarchical scheduling

High performance computing (HPC) workflows are undergoing tumultuous changes, including an explosion in size and complexity. Despite these changes, most batch job systems still use slow, centralized schedulers. Generalized hierarchical scheduling (GHS) solves many of the challenges that face modern...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The international journal of high performance computing applications 2022-05, Vol.36 (3), p.289-306
Hauptverfasser: Herbein, Stephen, Patki, Tapasya, Ahn, Dong H, Mobo, Sebastian, Hathaway, Clark, Caíno-Lores, Silvina, Corbett, James, Domyancic, David, Scogland, Thomas RW, de Supinski, Bronis R, Taufer, Michela
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:High performance computing (HPC) workflows are undergoing tumultuous changes, including an explosion in size and complexity. Despite these changes, most batch job systems still use slow, centralized schedulers. Generalized hierarchical scheduling (GHS) solves many of the challenges that face modern workflows, but GHS has not been widely adopted in HPC. A major difficulty that hinders adoption is the lack of a performance model to aid in configuring GHS for optimal performance on a given application. We propose an analytical performance model of GHS, and we validate our proposed model with four different applications on a moderately-sized system. Our validation shows that our model is extremely accurate at predicting the performance of GHS, explaining 98.7% of the variance (i.e., an R2 statistic of 0.987). Our results also support the claim that GHS overcomes scheduling throughput problems; we measured throughput improvements of up to 270× on our moderately-sized system. We then apply our performance model to a pre-exascale system, where our model predicts throughput improvements of four orders of magnitude and provides insight into optimally configuring GHS on next generation systems.
ISSN:1094-3420
1741-2846
DOI:10.1177/10943420211051039