Performance debugging shared memory parallel programs using run-time dependence analysis
We describe a new approach to performance debugging that focuses on automatically identifying computation transformations to reduce synchronization and communication. By grouping writes together into equivalence classes , we are able to tractably collect information from long-running programs. Our p...
Gespeichert in:
Veröffentlicht in: | Performance evaluation review 1997-06, Vol.25 (1), p.75-87 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We describe a new approach to performance debugging that focuses on automatically identifying computation transformations to reduce synchronization and communication. By grouping writes together into
equivalence classes
, we are able to tractably collect information from long-running programs. Our performance debugger analyzes this information and suggests computation transformations in terms of the source code. We present the transformations suggested by the debugger on a suite of four applications. For Barnes-Hut and Shallow, implementing the debugger suggestions improved the performance by a factor of 1.32 and 34 times respectively on an 8-processor IBM SP2. For Ocean, our debugger identified excess synchronization that did not have a significant impact on performance. ILINK, a genetic linkage analysis program widely used by geneticists, is already well optimized. We use it only to demonstrate the feasibility of our approach to long-running applications.We also give details on how our approach can be implemented. We use novel techniques to convert control dependences to data dependences, and to compute the source operands of stores. We report on the impact of our instrumentation on the same application suite we use for performance debugging. The instrumentation slows down the execution by a factor of between 4 and 169 times. The log files produced during execution were all less than 2.5 Mbytes in size. |
---|---|
ISSN: | 0163-5999 |
DOI: | 10.1145/258623.258678 |