SPADE: Scalable Performance and Accuracy analysis for Distributed and Extreme-scale systems

The SPADE project focuses on advancing monitoring, optimization, evaluation, and decision-making capabilities for extreme-scale systems. In Year 1, the team targets several advanced monitoring capabilities, such as developing support for AMD's new RocProfiler SDK to enable the analysis of hardw...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Jagode, Heike, Moore, Shirley V., Weaver, Vincent, Danalis, Anthony, Lauter, Christoph
Format: Bild
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The SPADE project focuses on advancing monitoring, optimization, evaluation, and decision-making capabilities for extreme-scale systems. In Year 1, the team targets several advanced monitoring capabilities, such as developing support for AMD's new RocProfiler SDK to enable the analysis of hardware performance counters on AMD APUs like MI300, which will be integrated into El Capitan. The SPADE team is also extending the PAPI library for heterogeneous CPU support. This will allow users to simultaneously monitor the performance of chips that support both high-end and low-end processors, enabling the system to be tuned for more effective switching between the various cores. Another initiative is the development of a Python version of PAPI (cyPAPI), specifically for use with frameworks and tools being developed for Python in HPC environments. The team is exploring beta versions of cyPAPI with PyTorch to advance decision-making capabilities for mixed-precision tuning of machine learning applications.
DOI:10.6084/m9.figshare.26452465