Sparbit: Towards to a Logarithmic-Cost and Data Locality-Aware MPI Allgather Algorithm
Collective communication operations are considered critical for improving the performance of exascale-ready and high-performance computing applications. On this work we focus on the Message-Passing Interface (MPI) Allgather and Allgatherv many to many collectives, which are amongst the most utilized...
Gespeichert in:
Veröffentlicht in: | Journal of grid computing 2023-06, Vol.21 (2), p.18, Article 18 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Collective communication operations are considered critical for improving the performance of exascale-ready and high-performance computing applications. On this work we focus on the Message-Passing Interface (MPI) Allgather and Allgatherv many to many collectives, which are amongst the most utilized and time-consuming operations. Each MPI algorithm for these calls suffers from different operational and performance limitations, that might include only working for restricted cases, requiring linear amounts of communication steps with the growth in number of processes, memory copies and shifts to assure correct data organization, and non-local data exchange patterns, most of which negatively contribute to the total operation time. All these characteristics create an environment that demands careful choices of alternatives to execute the call and where there is no
silver bullet
algorithm, which is the best for all cases. We propose the Stripe Parallel Binomial Trees (Sparbit) algorithm, which employs the binomial tree distribution to perform data exchanges with optimal time costs and no usage restrictions. It also maintains a much more local communication pattern that minimizes the delays due to long range exchanges, allowing the extraction of more performance from current systems when compared with asymptotically equivalent traditional algorithms. Experimental results indicate that nearly 40% of all calls to Allgather could experience mean reductions from 20% to 28% on execution time by employing Sparbit, with maximum reductions reaching near 74%. For Allgatherv, results are highly variable depending on the distribution of block sizes across the processes. |
---|---|
ISSN: | 1570-7873 1572-9184 |
DOI: | 10.1007/s10723-023-09650-5 |