An embedded sectioning scheme for multiprocessor topology-aware mapping of irregular applications
We consider the problem of mapping irregular applications to multiprocessor architectures whose interconnect topologies affect the latencies of data movement across processor nodes. The starting point for solutions to this problem concerns suitable weighted graph representations of an irregular appl...
Gespeichert in:
Veröffentlicht in: | The international journal of high performance computing applications 2017-01, Vol.31 (1), p.91-103 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We consider the problem of mapping irregular applications to multiprocessor architectures whose interconnect topologies affect the latencies of data movement across processor nodes. The starting point for solutions to this problem concerns suitable weighted graph representations of an irregular application and a processor topology. Prior results for this problem have demonstrated that graph partitioning approaches can provide high-quality solutions. Additionally, when coordinate information is available for the weighted graph of the application, the geometric mapping schemes can also provide high-quality solutions. We develop and present a scheme that we call ‘embedded sectioning’ that directly computes a locality enhancing embedding of the weighted graph representation which is then mapped to the processor topology using recursive coordinate bisection. Our scheme is specifically directed at gaining high-quality mappings for highly irregular applications where the amount of communication can vary greatly. We evaluate the quality of mappings produced by embedded sectioning for mesh-based processor topologies using well-accepted measures including congestion, dilation and their product, referred to as the communication volume. For a test suite of unit-weight graphs mapped to a 32 × 32 mesh of processors, our method improves congestion by 26%, dilation by 52% and communication volume by 64% relative to the best values of these measures from nine other schemes. Additionally, we observe that these improvements increase with an increase in the skewness of communication in applications. For a test suite with a skewness of two the corresponding improvements for congestion, dilation and communication volume are 72%, 52% and 87%, respectively. |
---|---|
ISSN: | 1094-3420 1741-2846 |
DOI: | 10.1177/1094342015597082 |