LIMITLESS — LIght-weight MonItoring Tool for LargE Scale Systems

This work presents LIMITLESS, a HPC framework that provides new strategies for monitoring clusters. LIMITLESS is a scalable light-weight monitor that is integrated with other HPC runtimes in order to obtain a holistic view of the system that combines both platform and application monitoring. This pa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Microprocessors and microsystems 2022-09, Vol.93, p.104586, Article 104586
Hauptverfasser: Cascajo, Alberto, Singh, David E., Carretero, Jesus
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This work presents LIMITLESS, a HPC framework that provides new strategies for monitoring clusters. LIMITLESS is a scalable light-weight monitor that is integrated with other HPC runtimes in order to obtain a holistic view of the system that combines both platform and application monitoring. This paper presents a description of the novel components of the architecture, including new approaches for reaching a higher scalability based on a combination of in-transit processing and performance prediction. We also include a methodology for improving application scheduling by means of machine learning classifiers and application profiling. This work also includes a practical evaluation on simulated and real platforms, that shows significant monitoring scalability, retrieving data capacity and reduced overheads. Results show that the performance prediction techniques reduce communications and the number of monitoring packets by more than 90% on average, and the fine-grain scheduling allows LIMITLESS to run applications in shared nodes reducing the makespan by 25% and saving resources.
ISSN:0141-9331
1872-9436
DOI:10.1016/j.micpro.2022.104586