Lessons learned at 208K: towards debugging millions of cores
Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: |
Software and its engineering
> Software creation and management
> Software development process management
Software and its engineering
> Software creation and management
> Software verification and validation
> Formal software verification
Software and its engineering
> Software creation and management
> Software verification and validation
> Software defect analysis
> Software testing and debugging
Software and its engineering
> Software notations and tools
> General programming languages
> Language types
> Concurrent programming languages
|
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks.
In this paper, we present challenges to petascale tool development, using the Stack Trace Analysis Tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines. |
---|---|
DOI: | 10.5555/1413370.1413397 |