Optimization of Collective Communication Operations in MPICH

We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The international journal of high performance computing applications 2005-04, Vol.19 (1), p.49-66
Hauptverfasser: Thakur, Rajeev, Rabenseifner, Rolf, Gropp, William
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 66
container_issue 1
container_start_page 49
container_title The international journal of high performance computing applications
container_volume 19
creator Thakur, Rajeev
Rabenseifner, Rolf
Gropp, William
description We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth use for long messages. Although we have implemented new algorithms for all MPI (Message Passing Interface) collective operations, because of limited space we describe only the algorithms for allgather, broadcast, all-to-all, reduce-scatter, reduce, and allreduce. Performance results on a Myrinet-connected Linux cluster and an IBM SP indicate that, in all cases, the new algorithms significantly outperform the old algorithms used in MPICH on the Myrinet cluster, and, in many cases, they outperform the algorithms used in IBM's MPI on the SP. We also explore in further detail the optimization of two of the most commonly used collective operations, allreduce and reduce, particularly for long messages and nonpower-of-two numbers of processes. The optimized algorithms for these operations perform several times better than the native algorithms on a Myrinet cluster, IBM SP, and Cray T3E. Our results indicate that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.
doi_str_mv 10.1177/1094342005051521
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_35029898</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A141082538</galeid><sage_id>10.1177_1094342005051521</sage_id><sourcerecordid>A141082538</sourcerecordid><originalsourceid>FETCH-LOGICAL-c503t-77b1e9b972183ceeb1609067170a0e965ba4f9b870db9e41adb938623d0839ad3</originalsourceid><addsrcrecordid>eNqFkctLxDAQxoso-Lx7XBS8VWfyaBLwIosvUNaDnkPanUqkbdamK-hfb9YKoiiSw3zk-30zCZNl-wjHiEqdIBjBBQOQIFEyXMu2UAnMmRbFetLJzlf-ZrYd4xMAFILLrex0thh869_c4EM3CfVkGpqGqsG_UJJtu-x8NXqzBfUfKk58N7m9u55e7WYbtWsi7X3Wnezh4vx-epXfzC6vp2c3eSWBD7lSJZIpjWKoeUVUYgEGCoUKHJApZOlEbUqtYF4aEuhS4bpgfA6aGzfnO9nR2HfRh-clxcG2PlbUNK6jsIyWS2BGG_0vyIxhKARL4MEP8Cks-y59wjIGSmojMEGHI_ToGrK-q8PQu2rV0Z6hQNBM8tXM41-odObU-ip0VPt0_y0AY6DqQ4w91XbR-9b1rxbBrnZpf-4yRfIxEt0jfT31T_4dqcSZxg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>220758941</pqid></control><display><type>article</type><title>Optimization of Collective Communication Operations in MPICH</title><source>Access via SAGE</source><source>Alma/SFX Local Collection</source><creator>Thakur, Rajeev ; Rabenseifner, Rolf ; Gropp, William</creator><creatorcontrib>Thakur, Rajeev ; Rabenseifner, Rolf ; Gropp, William</creatorcontrib><description>We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth use for long messages. Although we have implemented new algorithms for all MPI (Message Passing Interface) collective operations, because of limited space we describe only the algorithms for allgather, broadcast, all-to-all, reduce-scatter, reduce, and allreduce. Performance results on a Myrinet-connected Linux cluster and an IBM SP indicate that, in all cases, the new algorithms significantly outperform the old algorithms used in MPICH on the Myrinet cluster, and, in many cases, they outperform the algorithms used in IBM's MPI on the SP. We also explore in further detail the optimization of two of the most commonly used collective operations, allreduce and reduce, particularly for long messages and nonpower-of-two numbers of processes. The optimized algorithms for these operations perform several times better than the native algorithms on a Myrinet cluster, IBM SP, and Cray T3E. Our results indicate that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.</description><identifier>ISSN: 1094-3420</identifier><identifier>EISSN: 1741-2846</identifier><identifier>DOI: 10.1177/1094342005051521</identifier><language>eng</language><publisher>Thousand Oaks, CA: Sage Publications</publisher><subject>Algorithms ; Communication ; Computer networks ; Data communications ; High performance systems ; Information networks ; Management ; Optimization ; Studies</subject><ispartof>The international journal of high performance computing applications, 2005-04, Vol.19 (1), p.49-66</ispartof><rights>COPYRIGHT 2005 Sage Publications Ltd. (UK)</rights><rights>Copyright SAGE PUBLICATIONS, INC. Spring 2005</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c503t-77b1e9b972183ceeb1609067170a0e965ba4f9b870db9e41adb938623d0839ad3</citedby><cites>FETCH-LOGICAL-c503t-77b1e9b972183ceeb1609067170a0e965ba4f9b870db9e41adb938623d0839ad3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/1094342005051521$$EPDF$$P50$$Gsage$$H</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/1094342005051521$$EHTML$$P50$$Gsage$$H</linktohtml><link.rule.ids>314,780,784,21819,27924,27925,43621,43622</link.rule.ids></links><search><creatorcontrib>Thakur, Rajeev</creatorcontrib><creatorcontrib>Rabenseifner, Rolf</creatorcontrib><creatorcontrib>Gropp, William</creatorcontrib><title>Optimization of Collective Communication Operations in MPICH</title><title>The international journal of high performance computing applications</title><description>We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth use for long messages. Although we have implemented new algorithms for all MPI (Message Passing Interface) collective operations, because of limited space we describe only the algorithms for allgather, broadcast, all-to-all, reduce-scatter, reduce, and allreduce. Performance results on a Myrinet-connected Linux cluster and an IBM SP indicate that, in all cases, the new algorithms significantly outperform the old algorithms used in MPICH on the Myrinet cluster, and, in many cases, they outperform the algorithms used in IBM's MPI on the SP. We also explore in further detail the optimization of two of the most commonly used collective operations, allreduce and reduce, particularly for long messages and nonpower-of-two numbers of processes. The optimized algorithms for these operations perform several times better than the native algorithms on a Myrinet cluster, IBM SP, and Cray T3E. Our results indicate that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.</description><subject>Algorithms</subject><subject>Communication</subject><subject>Computer networks</subject><subject>Data communications</subject><subject>High performance systems</subject><subject>Information networks</subject><subject>Management</subject><subject>Optimization</subject><subject>Studies</subject><issn>1094-3420</issn><issn>1741-2846</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><recordid>eNqFkctLxDAQxoso-Lx7XBS8VWfyaBLwIosvUNaDnkPanUqkbdamK-hfb9YKoiiSw3zk-30zCZNl-wjHiEqdIBjBBQOQIFEyXMu2UAnMmRbFetLJzlf-ZrYd4xMAFILLrex0thh869_c4EM3CfVkGpqGqsG_UJJtu-x8NXqzBfUfKk58N7m9u55e7WYbtWsi7X3Wnezh4vx-epXfzC6vp2c3eSWBD7lSJZIpjWKoeUVUYgEGCoUKHJApZOlEbUqtYF4aEuhS4bpgfA6aGzfnO9nR2HfRh-clxcG2PlbUNK6jsIyWS2BGG_0vyIxhKARL4MEP8Cks-y59wjIGSmojMEGHI_ToGrK-q8PQu2rV0Z6hQNBM8tXM41-odObU-ip0VPt0_y0AY6DqQ4w91XbR-9b1rxbBrnZpf-4yRfIxEt0jfT31T_4dqcSZxg</recordid><startdate>20050401</startdate><enddate>20050401</enddate><creator>Thakur, Rajeev</creator><creator>Rabenseifner, Rolf</creator><creator>Gropp, William</creator><general>Sage Publications</general><general>Sage Publications Ltd. (UK)</general><general>SAGE PUBLICATIONS, INC</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20050401</creationdate><title>Optimization of Collective Communication Operations in MPICH</title><author>Thakur, Rajeev ; Rabenseifner, Rolf ; Gropp, William</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c503t-77b1e9b972183ceeb1609067170a0e965ba4f9b870db9e41adb938623d0839ad3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Algorithms</topic><topic>Communication</topic><topic>Computer networks</topic><topic>Data communications</topic><topic>High performance systems</topic><topic>Information networks</topic><topic>Management</topic><topic>Optimization</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Thakur, Rajeev</creatorcontrib><creatorcontrib>Rabenseifner, Rolf</creatorcontrib><creatorcontrib>Gropp, William</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>The international journal of high performance computing applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thakur, Rajeev</au><au>Rabenseifner, Rolf</au><au>Gropp, William</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimization of Collective Communication Operations in MPICH</atitle><jtitle>The international journal of high performance computing applications</jtitle><date>2005-04-01</date><risdate>2005</risdate><volume>19</volume><issue>1</issue><spage>49</spage><epage>66</epage><pages>49-66</pages><issn>1094-3420</issn><eissn>1741-2846</eissn><abstract>We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth use for long messages. Although we have implemented new algorithms for all MPI (Message Passing Interface) collective operations, because of limited space we describe only the algorithms for allgather, broadcast, all-to-all, reduce-scatter, reduce, and allreduce. Performance results on a Myrinet-connected Linux cluster and an IBM SP indicate that, in all cases, the new algorithms significantly outperform the old algorithms used in MPICH on the Myrinet cluster, and, in many cases, they outperform the algorithms used in IBM's MPI on the SP. We also explore in further detail the optimization of two of the most commonly used collective operations, allreduce and reduce, particularly for long messages and nonpower-of-two numbers of processes. The optimized algorithms for these operations perform several times better than the native algorithms on a Myrinet cluster, IBM SP, and Cray T3E. Our results indicate that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.</abstract><cop>Thousand Oaks, CA</cop><pub>Sage Publications</pub><doi>10.1177/1094342005051521</doi><tpages>18</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1094-3420
ispartof The international journal of high performance computing applications, 2005-04, Vol.19 (1), p.49-66
issn 1094-3420
1741-2846
language eng
recordid cdi_proquest_miscellaneous_35029898
source Access via SAGE; Alma/SFX Local Collection
subjects Algorithms
Communication
Computer networks
Data communications
High performance systems
Information networks
Management
Optimization
Studies
title Optimization of Collective Communication Operations in MPICH
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T20%3A08%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimization%20of%20Collective%20Communication%20Operations%20in%20MPICH&rft.jtitle=The%20international%20journal%20of%20high%20performance%20computing%20applications&rft.au=Thakur,%20Rajeev&rft.date=2005-04-01&rft.volume=19&rft.issue=1&rft.spage=49&rft.epage=66&rft.pages=49-66&rft.issn=1094-3420&rft.eissn=1741-2846&rft_id=info:doi/10.1177/1094342005051521&rft_dat=%3Cgale_proqu%3EA141082538%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=220758941&rft_id=info:pmid/&rft_galeid=A141082538&rft_sage_id=10.1177_1094342005051521&rfr_iscdi=true