Host-assisted zero-copy remote memory access communication on InfiniBand

Summary form only given. The remote memory access (RMA) is an increasingly important communication model due to its excellent potential for overlapping communication and computations and achieving high performance on modern networks with RDMA hardware such as Infiniband. RMA plays a vital role in su...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Tipparaju, V., Santhanaraman, G., Nieplocha, J., Panda, O.K.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Access protocols Bandwidth Computer networks Context Data structures Delay Distributed processing Laboratories Libraries Message passing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Summary form only given. The remote memory access (RMA) is an increasingly important communication model due to its excellent potential for overlapping communication and computations and achieving high performance on modern networks with RDMA hardware such as Infiniband. RMA plays a vital role in supporting the emerging global address space programming models. We describe how RMA can be implemented efficiently over InfiniBand. The capabilities not offered directly by the Infiniband verb layer can be implemented efficiently using the novel host-assisted approach while achieving zero-copy communication and supporting an excellent overlap of computation with communication. For contiguous data we are able to achieve a small message latency of 6/spl mu/s and a peak bandwidth of 830 MB/s for 'put' and a small message latency of 12/spl mu/s and a peak bandwidth of 765 Megabytes for 'get'. These numbers are almost as good as the performance of the native VAPI layer. For the noncontiguous data, the host assisted approach can deliver bandwidth close to that for the contiguous data. We also demonstrate the superior tolerance of host-assisted data-transfer operations to CPU intensive tasks due to minimum host involvement in our approach as compared to the traditional host-based approach. Our implementation also supports a very high degree of overlap of computation and communication. 99% overlap for contiguous and up to 95% for noncontiguous in case of large message sizes were achieved. The NAS MG and matrix multiplication benchmarks were used to validate effectiveness of our approach, and demonstrated excellent overall performance.
DOI:	10.1109/IPDPS.2004.1302943