Sparbit: Towards to a Logarithmic-Cost and Data Locality-Aware MPI Allgather Algorithm

Collective communication operations are considered critical for improving the performance of exascale-ready and high-performance computing applications. On this work we focus on the Message-Passing Interface (MPI) Allgather and Allgatherv many to many collectives, which are amongst the most utilized...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of grid computing 2023-06, Vol.21 (2), p.18, Article 18
Hauptverfasser:	Loch, Wilton Jaciel, Koslovski, Guilherme Piêgas
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Communication Computer Science Data exchange Management of Computing and Information Systems Message passing Processor Architectures User Interfaces and Human Computer Interaction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Collective communication operations are considered critical for improving the performance of exascale-ready and high-performance computing applications. On this work we focus on the Message-Passing Interface (MPI) Allgather and Allgatherv many to many collectives, which are amongst the most utilized and time-consuming operations. Each MPI algorithm for these calls suffers from different operational and performance limitations, that might include only working for restricted cases, requiring linear amounts of communication steps with the growth in number of processes, memory copies and shifts to assure correct data organization, and non-local data exchange patterns, most of which negatively contribute to the total operation time. All these characteristics create an environment that demands careful choices of alternatives to execute the call and where there is no silver bullet algorithm, which is the best for all cases. We propose the Stripe Parallel Binomial Trees (Sparbit) algorithm, which employs the binomial tree distribution to perform data exchanges with optimal time costs and no usage restrictions. It also maintains a much more local communication pattern that minimizes the delays due to long range exchanges, allowing the extraction of more performance from current systems when compared with asymptotically equivalent traditional algorithms. Experimental results indicate that nearly 40% of all calls to Allgather could experience mean reductions from 20% to 28% on execution time by employing Sparbit, with maximum reductions reaching near 74%. For Allgatherv, results are highly variable depending on the distribution of block sizes across the processes.
ISSN:	1570-7873 1572-9184
DOI:	10.1007/s10723-023-09650-5