Optimization of Collective Communication Operations in MPICH

We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The international journal of high performance computing applications 2005-04, Vol.19 (1), p.49-66
Hauptverfasser:	Thakur, Rajeev, Rabenseifner, Rolf, Gropp, William
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Communication Computer networks Data communications High performance systems Information networks Management Optimization Studies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	66
container_issue	1
container_start_page	49
container_title	The international journal of high performance computing applications
container_volume	19
creator	Thakur, Rajeev Rabenseifner, Rolf Gropp, William
description	We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth use for long messages. Although we have implemented new algorithms for all MPI (Message Passing Interface) collective operations, because of limited space we describe only the algorithms for allgather, broadcast, all-to-all, reduce-scatter, reduce, and allreduce. Performance results on a Myrinet-connected Linux cluster and an IBM SP indicate that, in all cases, the new algorithms significantly outperform the old algorithms used in MPICH on the Myrinet cluster, and, in many cases, they outperform the algorithms used in IBM's MPI on the SP. We also explore in further detail the optimization of two of the most commonly used collective operations, allreduce and reduce, particularly for long messages and nonpower-of-two numbers of processes. The optimized algorithms for these operations perform several times better than the native algorithms on a Myrinet cluster, IBM SP, and Cray T3E. Our results indicate that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.
doi_str_mv	10.1177/1094342005051521
format	Article
fullrecord	<record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_35029898</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A141082538</galeid><sage_id>10.1177_1094342005051521</sage_id><sourcerecordid>A141082538</sourcerecordid><originalsourceid>FETCH-LOGICAL-c503t-77b1e9b972183ceeb1609067170a0e965ba4f9b870db9e41adb938623d0839ad3</originalsourceid><addsrcrecordid>eNqFkctLxDAQxoso-Lx7XBS8VWfyaBLwIosvUNaDnkPanUqkbdamK-hfb9YKoiiSw3zk-30zCZNl-wjHiEqdIBjBBQOQIFEyXMu2UAnMmRbFetLJzlf-ZrYd4xMAFILLrex0thh869_c4EM3CfVkGpqGqsG_UJJtu-x8NXqzBfUfKk58N7m9u55e7WYbtWsi7X3Wnezh4vx-epXfzC6vp2c3eSWBD7lSJZIpjWKoeUVUYgEGCoUKHJApZOlEbUqtYF4aEuhS4bpgfA6aGzfnO9nR2HfRh-clxcG2PlbUNK6jsIyWS2BGG_0vyIxhKARL4MEP8Cks-y59wjIGSmojMEGHI_ToGrK-q8PQu2rV0Z6hQNBM8tXM41-odObU-ip0VPt0_y0AY6DqQ4w91XbR-9b1rxbBrnZpf-4yRfIxEt0jfT31T_4dqcSZxg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>220758941</pqid></control><display><type>article</type><title>Optimization of Collective Communication Operations in MPICH</title><source>Access via SAGE</source><source>Alma/SFX Local Collection</source><creator>Thakur, Rajeev ; Rabenseifner, Rolf ; Gropp, William</creator><creatorcontrib>Thakur, Rajeev ; Rabenseifner, Rolf ; Gropp, William</creatorcontrib><description>We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth use for long messages. Although we have implemented new algorithms for all MPI (Message Passing Interface) collective operations, because of limited space we describe only the algorithms for allgather, broadcast, all-to-all, reduce-scatter, reduce, and allreduce. Performance results on a Myrinet-connected Linux cluster and an IBM SP indicate that, in all cases, the new algorithms significantly outperform the old algorithms used in MPICH on the Myrinet cluster, and, in many cases, they outperform the algorithms used in IBM's MPI on the SP. We also explore in further detail the optimization of two of the most commonly used collective operations, allreduce and reduce, particularly for long messages and nonpower-of-two numbers of processes. The optimized algorithms for these operations perform several times better than the native algorithms on a Myrinet cluster, IBM SP, and Cray T3E. Our results indicate that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.</description><identifier>ISSN: 1094-3420</identifier><identifier>EISSN: 1741-2846</identifier><identifier>DOI: 10.1177/1094342005051521</identifier><language>eng</language><publisher>Thousand Oaks, CA: Sage Publications</publisher><subject>Algorithms ; Communication ; Computer networks ; Data communications ; High performance systems ; Information networks ; Management ; Optimization ; Studies</subject><ispartof>The international journal of high performance computing applications, 2005-04, Vol.19 (1), p.49-66</ispartof><rights>COPYRIGHT 2005 Sage Publications Ltd. (UK)</rights><rights>Copyright SAGE PUBLICATIONS, INC. Spring 2005</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c503t-77b1e9b972183ceeb1609067170a0e965ba4f9b870db9e41adb938623d0839ad3</citedby><cites>FETCH-LOGICAL-c503t-77b1e9b972183ceeb1609067170a0e965ba4f9b870db9e41adb938623d0839ad3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/1094342005051521$$EPDF$$P50$$Gsage$$H</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/1094342005051521$$EHTML$$P50$$Gsage$$H</linktohtml><link.rule.ids>314,780,784,21819,27924,27925,43621,43622</link.rule.ids></links><search><creatorcontrib>Thakur, Rajeev</creatorcontrib><creatorcontrib>Rabenseifner, Rolf</creatorcontrib><creatorcontrib>Gropp, William</creatorcontrib><title>Optimization of Collective Communication Operations in MPICH</title><title>The international journal of high performance computing applications</title><description>We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth use for long messages. Although we have implemented new algorithms for all MPI (Message Passing Interface) collective operations, because of limited space we describe only the algorithms for allgather, broadcast, all-to-all, reduce-scatter, reduce, and allreduce. Performance results on a Myrinet-connected Linux cluster and an IBM SP indicate that, in all cases, the new algorithms significantly outperform the old algorithms used in MPICH on the Myrinet cluster, and, in many cases, they outperform the algorithms used in IBM's MPI on the SP. We also explore in further detail the optimization of two of the most commonly used collective operations, allreduce and reduce, particularly for long messages and nonpower-of-two numbers of processes. The optimized algorithms for these operations perform several times better than the native algorithms on a Myrinet cluster, IBM SP, and Cray T3E. Our results indicate that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.</description><subject>Algorithms</subject><subject>Communication</subject><subject>Computer networks</subject><subject>Data communications</subject><subject>High performance systems</subject><subject>Information networks</subject><subject>Management</subject><subject>Optimization</subject><subject>Studies</subject><issn>1094-3420</issn><issn>1741-2846</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><recordid>eNqFkctLxDAQxoso-Lx7XBS8VWfyaBLwIosvUNaDnkPanUqkbdamK-hfb9YKoiiSw3zk-30zCZNl-wjHiEqdIBjBBQOQIFEyXMu2UAnMmRbFetLJzlf-ZrYd4xMAFILLrex0thh869_c4EM3CfVkGpqGqsG_UJJtu-x8NXqzBfUfKk58N7m9u55e7WYbtWsi7X3Wnezh4vx-epXfzC6vp2c3eSWBD7lSJZIpjWKoeUVUYgEGCoUKHJApZOlEbUqtYF4aEuhS4bpgfA6aGzfnO9nR2HfRh-clxcG2PlbUNK6jsIyWS2BGG_0vyIxhKARL4MEP8Cks-y59wjIGSmojMEGHI_ToGrK-q8PQu2rV0Z6hQNBM8tXM41-odObU-ip0VPt0_y0AY6DqQ4w91XbR-9b1rxbBrnZpf-4yRfIxEt0jfT31T_4dqcSZxg</recordid><startdate>20050401</startdate><enddate>20050401</enddate><creator>Thakur, Rajeev</creator><creator>Rabenseifner, Rolf</creator><creator>Gropp, William</creator><general>Sage Publications</general><general>Sage Publications Ltd. (UK)</general><general>SAGE PUBLICATIONS, INC</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20050401</creationdate><title>Optimization of Collective Communication Operations in MPICH</title><author>Thakur, Rajeev ; Rabenseifner, Rolf ; Gropp, William</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c503t-77b1e9b972183ceeb1609067170a0e965ba4f9b870db9e41adb938623d0839ad3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Algorithms</topic><topic>Communication</topic><topic>Computer networks</topic><topic>Data communications</topic><topic>High performance systems</topic><topic>Information networks</topic><topic>Management</topic><topic>Optimization</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Thakur, Rajeev</creatorcontrib><creatorcontrib>Rabenseifner, Rolf</creatorcontrib><creatorcontrib>Gropp, William</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>The international journal of high performance computing applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thakur, Rajeev</au><au>Rabenseifner, Rolf</au><au>Gropp, William</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimization of Collective Communication Operations in MPICH</atitle><jtitle>The international journal of high performance computing applications</jtitle><date>2005-04-01</date><risdate>2005</risdate><volume>19</volume><issue>1</issue><spage>49</spage><epage>66</epage><pages>49-66</pages><issn>1094-3420</issn><eissn>1741-2846</eissn><abstract>We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth use for long messages. Although we have implemented new algorithms for all MPI (Message Passing Interface) collective operations, because of limited space we describe only the algorithms for allgather, broadcast, all-to-all, reduce-scatter, reduce, and allreduce. Performance results on a Myrinet-connected Linux cluster and an IBM SP indicate that, in all cases, the new algorithms significantly outperform the old algorithms used in MPICH on the Myrinet cluster, and, in many cases, they outperform the algorithms used in IBM's MPI on the SP. We also explore in further detail the optimization of two of the most commonly used collective operations, allreduce and reduce, particularly for long messages and nonpower-of-two numbers of processes. The optimized algorithms for these operations perform several times better than the native algorithms on a Myrinet cluster, IBM SP, and Cray T3E. Our results indicate that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.</abstract><cop>Thousand Oaks, CA</cop><pub>Sage Publications</pub><doi>10.1177/1094342005051521</doi><tpages>18</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1094-3420
ispartof	The international journal of high performance computing applications, 2005-04, Vol.19 (1), p.49-66
issn	1094-3420 1741-2846
language	eng
recordid	cdi_proquest_miscellaneous_35029898
source	Access via SAGE; Alma/SFX Local Collection
subjects	Algorithms Communication Computer networks Data communications High performance systems Information networks Management Optimization Studies
title	Optimization of Collective Communication Operations in MPICH
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T20%3A08%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimization%20of%20Collective%20Communication%20Operations%20in%20MPICH&rft.jtitle=The%20international%20journal%20of%20high%20performance%20computing%20applications&rft.au=Thakur,%20Rajeev&rft.date=2005-04-01&rft.volume=19&rft.issue=1&rft.spage=49&rft.epage=66&rft.pages=49-66&rft.issn=1094-3420&rft.eissn=1741-2846&rft_id=info:doi/10.1177/1094342005051521&rft_dat=%3Cgale_proqu%3EA141082538%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=220758941&rft_id=info:pmid/&rft_galeid=A141082538&rft_sage_id=10.1177_1094342005051521&rfr_iscdi=true