Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs

Collective communication APIs equip MPI vendors with the necessary context to optimize cluster-wide operations on the basis of theoretical complexity models and characteristics of the involved interconnects. Modern HPC runtime systems with a programmability focus can perform dependency analysis to e...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of parallel programming 2024-06, Vol.52 (3), p.171-186
Hauptverfasser: Knorr, Fabian, Salzmann, Philip, Thoman, Peter, Fahringer, Thomas
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 186
container_issue 3
container_start_page 171
container_title International journal of parallel programming
container_volume 52
creator Knorr, Fabian
Salzmann, Philip
Thoman, Peter
Fahringer, Thomas
description Collective communication APIs equip MPI vendors with the necessary context to optimize cluster-wide operations on the basis of theoretical complexity models and characteristics of the involved interconnects. Modern HPC runtime systems with a programmability focus can perform dependency analysis to eliminate the need for manual communication entirely. Profiting from optimized collective routines in this context often requires global analysis of the implicit point-to-point communication pattern or tight constrains on the data access patterns allowed inside kernels. The Celerity API provides a high degree of freedom for both runtime implementors and application developers by tieing transparent work assignment to data access patterns through user-defined range-mapper functions. Canonically, data dependencies are resolved through an intra-node coherence model and inter-node point-to-point communication. This paper presents Collective Pattern Discovery (CPD), a fully distributed, coordination-free method for detecting collective communication patterns on parallelized task graphs. Through extensive scheduling and communication microbenchmarks as well as a strong scaling experiment on a compute-intensive application, we demonstrate that CPD can achieve substantial performance gains in the Celerity model.
doi_str_mv 10.1007/s10766-024-00767-y
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3068994460</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3068994460</sourcerecordid><originalsourceid>FETCH-LOGICAL-c314t-7d077b3b8ba4004a9acef4a5e3b95b451b2e4a66a7aa384d7e394551d02f67803</originalsourceid><addsrcrecordid>eNp9kE1Lw0AQhhdRsFb_gKeA59XZ7GeOpWoVCnqo52WTbDQ1ydbdpBB_fbdG8OZp3oHnnYEHoWsCtwRA3gUCUggMKcNxFRKPJ2hGuKRYCganaAZKcSwZV-foIoQtAGRSqRnaLIbetaavi-S-DoXbWz8mrkqWrmls0dd7G2PbDl1dRMh1yavpe-u7kNTH7E3EmvrblsnGhM9k5c3uI1yis8o0wV79zjl6e3zYLJ_w-mX1vFyscUEJ67EsQcqc5io3DICZzBS2YoZbmmc8Z5zkqWVGCCONoYqV0tKMcU5KSCshFdA5upnu7rz7Gmzo9dYNvosvNQWhsowxcaTSiSq8C8HbSu983Ro_agL6aE9P9nS0p3_s6TGW6FQKEe7erf87_U_rACAdc5A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3068994460</pqid></control><display><type>article</type><title>Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs</title><source>SpringerLink Journals - AutoHoldings</source><creator>Knorr, Fabian ; Salzmann, Philip ; Thoman, Peter ; Fahringer, Thomas</creator><creatorcontrib>Knorr, Fabian ; Salzmann, Philip ; Thoman, Peter ; Fahringer, Thomas</creatorcontrib><description>Collective communication APIs equip MPI vendors with the necessary context to optimize cluster-wide operations on the basis of theoretical complexity models and characteristics of the involved interconnects. Modern HPC runtime systems with a programmability focus can perform dependency analysis to eliminate the need for manual communication entirely. Profiting from optimized collective routines in this context often requires global analysis of the implicit point-to-point communication pattern or tight constrains on the data access patterns allowed inside kernels. The Celerity API provides a high degree of freedom for both runtime implementors and application developers by tieing transparent work assignment to data access patterns through user-defined range-mapper functions. Canonically, data dependencies are resolved through an intra-node coherence model and inter-node point-to-point communication. This paper presents Collective Pattern Discovery (CPD), a fully distributed, coordination-free method for detecting collective communication patterns on parallelized task graphs. Through extensive scheduling and communication microbenchmarks as well as a strong scaling experiment on a compute-intensive application, we demonstrate that CPD can achieve substantial performance gains in the Celerity model.</description><identifier>ISSN: 0885-7458</identifier><identifier>EISSN: 1573-7640</identifier><identifier>DOI: 10.1007/s10766-024-00767-y</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Application programming interface ; Communication ; Computer Science ; Context ; Graphs ; Message passing ; Processor Architectures ; Software Engineering/Programming and Operating Systems ; Task scheduling ; Theory of Computation</subject><ispartof>International journal of parallel programming, 2024-06, Vol.52 (3), p.171-186</ispartof><rights>The Author(s) 2024</rights><rights>The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c314t-7d077b3b8ba4004a9acef4a5e3b95b451b2e4a66a7aa384d7e394551d02f67803</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10766-024-00767-y$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10766-024-00767-y$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Knorr, Fabian</creatorcontrib><creatorcontrib>Salzmann, Philip</creatorcontrib><creatorcontrib>Thoman, Peter</creatorcontrib><creatorcontrib>Fahringer, Thomas</creatorcontrib><title>Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs</title><title>International journal of parallel programming</title><addtitle>Int J Parallel Prog</addtitle><description>Collective communication APIs equip MPI vendors with the necessary context to optimize cluster-wide operations on the basis of theoretical complexity models and characteristics of the involved interconnects. Modern HPC runtime systems with a programmability focus can perform dependency analysis to eliminate the need for manual communication entirely. Profiting from optimized collective routines in this context often requires global analysis of the implicit point-to-point communication pattern or tight constrains on the data access patterns allowed inside kernels. The Celerity API provides a high degree of freedom for both runtime implementors and application developers by tieing transparent work assignment to data access patterns through user-defined range-mapper functions. Canonically, data dependencies are resolved through an intra-node coherence model and inter-node point-to-point communication. This paper presents Collective Pattern Discovery (CPD), a fully distributed, coordination-free method for detecting collective communication patterns on parallelized task graphs. Through extensive scheduling and communication microbenchmarks as well as a strong scaling experiment on a compute-intensive application, we demonstrate that CPD can achieve substantial performance gains in the Celerity model.</description><subject>Application programming interface</subject><subject>Communication</subject><subject>Computer Science</subject><subject>Context</subject><subject>Graphs</subject><subject>Message passing</subject><subject>Processor Architectures</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Task scheduling</subject><subject>Theory of Computation</subject><issn>0885-7458</issn><issn>1573-7640</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><recordid>eNp9kE1Lw0AQhhdRsFb_gKeA59XZ7GeOpWoVCnqo52WTbDQ1ydbdpBB_fbdG8OZp3oHnnYEHoWsCtwRA3gUCUggMKcNxFRKPJ2hGuKRYCganaAZKcSwZV-foIoQtAGRSqRnaLIbetaavi-S-DoXbWz8mrkqWrmls0dd7G2PbDl1dRMh1yavpe-u7kNTH7E3EmvrblsnGhM9k5c3uI1yis8o0wV79zjl6e3zYLJ_w-mX1vFyscUEJ67EsQcqc5io3DICZzBS2YoZbmmc8Z5zkqWVGCCONoYqV0tKMcU5KSCshFdA5upnu7rz7Gmzo9dYNvosvNQWhsowxcaTSiSq8C8HbSu983Ro_agL6aE9P9nS0p3_s6TGW6FQKEe7erf87_U_rACAdc5A</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Knorr, Fabian</creator><creator>Salzmann, Philip</creator><creator>Thoman, Peter</creator><creator>Fahringer, Thomas</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20240601</creationdate><title>Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs</title><author>Knorr, Fabian ; Salzmann, Philip ; Thoman, Peter ; Fahringer, Thomas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c314t-7d077b3b8ba4004a9acef4a5e3b95b451b2e4a66a7aa384d7e394551d02f67803</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Application programming interface</topic><topic>Communication</topic><topic>Computer Science</topic><topic>Context</topic><topic>Graphs</topic><topic>Message passing</topic><topic>Processor Architectures</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Task scheduling</topic><topic>Theory of Computation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Knorr, Fabian</creatorcontrib><creatorcontrib>Salzmann, Philip</creatorcontrib><creatorcontrib>Thoman, Peter</creatorcontrib><creatorcontrib>Fahringer, Thomas</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>International journal of parallel programming</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Knorr, Fabian</au><au>Salzmann, Philip</au><au>Thoman, Peter</au><au>Fahringer, Thomas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs</atitle><jtitle>International journal of parallel programming</jtitle><stitle>Int J Parallel Prog</stitle><date>2024-06-01</date><risdate>2024</risdate><volume>52</volume><issue>3</issue><spage>171</spage><epage>186</epage><pages>171-186</pages><issn>0885-7458</issn><eissn>1573-7640</eissn><abstract>Collective communication APIs equip MPI vendors with the necessary context to optimize cluster-wide operations on the basis of theoretical complexity models and characteristics of the involved interconnects. Modern HPC runtime systems with a programmability focus can perform dependency analysis to eliminate the need for manual communication entirely. Profiting from optimized collective routines in this context often requires global analysis of the implicit point-to-point communication pattern or tight constrains on the data access patterns allowed inside kernels. The Celerity API provides a high degree of freedom for both runtime implementors and application developers by tieing transparent work assignment to data access patterns through user-defined range-mapper functions. Canonically, data dependencies are resolved through an intra-node coherence model and inter-node point-to-point communication. This paper presents Collective Pattern Discovery (CPD), a fully distributed, coordination-free method for detecting collective communication patterns on parallelized task graphs. Through extensive scheduling and communication microbenchmarks as well as a strong scaling experiment on a compute-intensive application, we demonstrate that CPD can achieve substantial performance gains in the Celerity model.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10766-024-00767-y</doi><tpages>16</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0885-7458
ispartof International journal of parallel programming, 2024-06, Vol.52 (3), p.171-186
issn 0885-7458
1573-7640
language eng
recordid cdi_proquest_journals_3068994460
source SpringerLink Journals - AutoHoldings
subjects Application programming interface
Communication
Computer Science
Context
Graphs
Message passing
Processor Architectures
Software Engineering/Programming and Operating Systems
Task scheduling
Theory of Computation
title Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T04%3A43%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20Discovery%20of%20Collective%20Communication%20Patterns%20in%20Parallelized%20Task%20Graphs&rft.jtitle=International%20journal%20of%20parallel%20programming&rft.au=Knorr,%20Fabian&rft.date=2024-06-01&rft.volume=52&rft.issue=3&rft.spage=171&rft.epage=186&rft.pages=171-186&rft.issn=0885-7458&rft.eissn=1573-7640&rft_id=info:doi/10.1007/s10766-024-00767-y&rft_dat=%3Cproquest_cross%3E3068994460%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3068994460&rft_id=info:pmid/&rfr_iscdi=true