LLload: An Easy-to-Use HPC Utilization Tool
The increasing use and cost of high performance computing (HPC) requires new easy-to-use tools to enable HPC users and HPC systems engineers to transparently understand the utilization of resources. The MIT Lincoln Laboratory Supercomputing Center (LLSC) has developed a simple command, LLload, to mo...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The increasing use and cost of high performance computing (HPC) requires new
easy-to-use tools to enable HPC users and HPC systems engineers to
transparently understand the utilization of resources. The MIT Lincoln
Laboratory Supercomputing Center (LLSC) has developed a simple command, LLload,
to monitor and characterize HPC workloads. LLload plays an important role in
identifying opportunities for better utilization of compute resources. LLload
can be used to monitor jobs both programmatically and interactively. LLload can
characterize users' jobs using various LLload options to achieve better
efficiency. This information can be used to inform the user to optimize HPC
workloads and improve both CPU and GPU utilization. This includes improvements
using judicious oversubscription of the computing resources. Preliminary
results suggest significant improvement in GPU utilization and overall
throughput performance with GPU overloading in some cases. By enabling users to
observe and fix incorrect job submission and/or inappropriate execution setups,
LLload can increase the resource usage and improve the overall throughput
performance. LLload is a light-weight, easy-to-use tool for both HPC users and
HPC systems engineers to monitor HPC workloads to improve system utilization
and efficiency. |
---|---|
DOI: | 10.48550/arxiv.2410.21036 |