Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines

In the generalized minimal residual method (GMRES), the global all-to-all communication required in each iteration for orthogonalization and normalization of the Krylov base vectors is becoming a performance bottleneck on massively parallel machines. Long latencies, system noise, and load imbalance...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SIAM journal on scientific computing 2013-01, Vol.35 (1), p.C48-C71
Hauptverfasser:	Ghysels, P, Ashby, T J, Meerbergen, K, Vanroose, W
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Communication Computation Computer science Eigenvalues Iterative methods Mathematical analysis Mathematical models Methods Noise Reduction Synchronism Synchronization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	C71
container_issue	1
container_start_page	C48
container_title	SIAM journal on scientific computing
container_volume	35
creator	Ghysels, P Ashby, T J Meerbergen, K Vanroose, W
description	In the generalized minimal residual method (GMRES), the global all-to-all communication required in each iteration for orthogonalization and normalization of the Krylov base vectors is becoming a performance bottleneck on massively parallel machines. Long latencies, system noise, and load imbalance cause these global reductions to become very costly global synchronizations. In this work, we propose the use of nonblocking or asynchronous global reductions to hide these global communication latencies by overlapping them with other communications and calculations. A pipelined variation of GMRES is presented in which the result of a global reduction is used only one or more iterations after the communication phase has started. This way, global synchronization is relaxed and scalability is much improved at the expense of some extra computations. The numerical instabilities that inevitably arise due to the typical monomial basis by powering the matrix are reduced and often annihilated by using Newton or Chebyshev bases instead. Our parallel experiments on a medium-sized cluster show significant speedups of the pipelined solvers compared to standard GMRES. An analytical model is used to extrapolate the performance to future exascale systems. [PUBLICATION ABSTRACT]
doi_str_mv	10.1137/12086563X
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1315645991</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1315645991</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-8ea0e95eedb6c14449a6f2a0c7d6b3bdbf6fdd08658aece32cc049a543896a123</originalsourceid><addsrcrecordid>eNpd0N9LwzAQB_AgCs7pg_9BwBd9qOaaNG0ex5ibMFH8AfpU0jTdMtJkJq2w_96OiQ8-3XF8OO6-CF0CuQWg-R2kpOAZpx9HaAREZEkOIj_e95wlRZpnp-gsxg0hwJlIR-hzYWrjVnhufSUtnvq27Z1RsjPe4aXstFM7bBzu1hrPH19mr3hiVz6Ybt3iQTzKGM23tjv8LIO0VtthpNbG6XiOThppo774rWP0fj97my6S5dP8YTpZJooK6JJCS6JFpnVdcQWMMSF5k0qi8ppXtKqrhjd1vX-qkFppmipFBpMxWgguIaVjdH3Yuw3-q9exK1sTlbZWOu37WAKFjLNMCBjo1T-68X1ww3UlpDyHNKcFG9TNQangYwy6KbfBtDLsSiDlPuTyL2T6AzQ2beQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1267127384</pqid></control><display><type>article</type><title>Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines</title><source>SIAM Journals Online</source><creator>Ghysels, P ; Ashby, T J ; Meerbergen, K ; Vanroose, W</creator><creatorcontrib>Ghysels, P ; Ashby, T J ; Meerbergen, K ; Vanroose, W</creatorcontrib><description>In the generalized minimal residual method (GMRES), the global all-to-all communication required in each iteration for orthogonalization and normalization of the Krylov base vectors is becoming a performance bottleneck on massively parallel machines. Long latencies, system noise, and load imbalance cause these global reductions to become very costly global synchronizations. In this work, we propose the use of nonblocking or asynchronous global reductions to hide these global communication latencies by overlapping them with other communications and calculations. A pipelined variation of GMRES is presented in which the result of a global reduction is used only one or more iterations after the communication phase has started. This way, global synchronization is relaxed and scalability is much improved at the expense of some extra computations. The numerical instabilities that inevitably arise due to the typical monomial basis by powering the matrix are reduced and often annihilated by using Newton or Chebyshev bases instead. Our parallel experiments on a medium-sized cluster show significant speedups of the pipelined solvers compared to standard GMRES. An analytical model is used to extrapolate the performance to future exascale systems. [PUBLICATION ABSTRACT]</description><identifier>ISSN: 1064-8275</identifier><identifier>EISSN: 1095-7197</identifier><identifier>DOI: 10.1137/12086563X</identifier><language>eng</language><publisher>Philadelphia: Society for Industrial and Applied Mathematics</publisher><subject>Algorithms ; Communication ; Computation ; Computer science ; Eigenvalues ; Iterative methods ; Mathematical analysis ; Mathematical models ; Methods ; Noise ; Reduction ; Synchronism ; Synchronization</subject><ispartof>SIAM journal on scientific computing, 2013-01, Vol.35 (1), p.C48-C71</ispartof><rights>2013, Society for Industrial and Applied Mathematics</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c391t-8ea0e95eedb6c14449a6f2a0c7d6b3bdbf6fdd08658aece32cc049a543896a123</citedby><cites>FETCH-LOGICAL-c391t-8ea0e95eedb6c14449a6f2a0c7d6b3bdbf6fdd08658aece32cc049a543896a123</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,3171,27901,27902</link.rule.ids></links><search><creatorcontrib>Ghysels, P</creatorcontrib><creatorcontrib>Ashby, T J</creatorcontrib><creatorcontrib>Meerbergen, K</creatorcontrib><creatorcontrib>Vanroose, W</creatorcontrib><title>Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines</title><title>SIAM journal on scientific computing</title><description>In the generalized minimal residual method (GMRES), the global all-to-all communication required in each iteration for orthogonalization and normalization of the Krylov base vectors is becoming a performance bottleneck on massively parallel machines. Long latencies, system noise, and load imbalance cause these global reductions to become very costly global synchronizations. In this work, we propose the use of nonblocking or asynchronous global reductions to hide these global communication latencies by overlapping them with other communications and calculations. A pipelined variation of GMRES is presented in which the result of a global reduction is used only one or more iterations after the communication phase has started. This way, global synchronization is relaxed and scalability is much improved at the expense of some extra computations. The numerical instabilities that inevitably arise due to the typical monomial basis by powering the matrix are reduced and often annihilated by using Newton or Chebyshev bases instead. Our parallel experiments on a medium-sized cluster show significant speedups of the pipelined solvers compared to standard GMRES. An analytical model is used to extrapolate the performance to future exascale systems. [PUBLICATION ABSTRACT]</description><subject>Algorithms</subject><subject>Communication</subject><subject>Computation</subject><subject>Computer science</subject><subject>Eigenvalues</subject><subject>Iterative methods</subject><subject>Mathematical analysis</subject><subject>Mathematical models</subject><subject>Methods</subject><subject>Noise</subject><subject>Reduction</subject><subject>Synchronism</subject><subject>Synchronization</subject><issn>1064-8275</issn><issn>1095-7197</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNpd0N9LwzAQB_AgCs7pg_9BwBd9qOaaNG0ex5ibMFH8AfpU0jTdMtJkJq2w_96OiQ8-3XF8OO6-CF0CuQWg-R2kpOAZpx9HaAREZEkOIj_e95wlRZpnp-gsxg0hwJlIR-hzYWrjVnhufSUtnvq27Z1RsjPe4aXstFM7bBzu1hrPH19mr3hiVz6Ybt3iQTzKGM23tjv8LIO0VtthpNbG6XiOThppo774rWP0fj97my6S5dP8YTpZJooK6JJCS6JFpnVdcQWMMSF5k0qi8ppXtKqrhjd1vX-qkFppmipFBpMxWgguIaVjdH3Yuw3-q9exK1sTlbZWOu37WAKFjLNMCBjo1T-68X1ww3UlpDyHNKcFG9TNQangYwy6KbfBtDLsSiDlPuTyL2T6AzQ2beQ</recordid><startdate>20130101</startdate><enddate>20130101</enddate><creator>Ghysels, P</creator><creator>Ashby, T J</creator><creator>Meerbergen, K</creator><creator>Vanroose, W</creator><general>Society for Industrial and Applied Mathematics</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7WY</scope><scope>7WZ</scope><scope>7X2</scope><scope>7XB</scope><scope>87Z</scope><scope>88A</scope><scope>88F</scope><scope>88I</scope><scope>88K</scope><scope>8AL</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>KB.</scope><scope>L.-</scope><scope>L6V</scope><scope>LK8</scope><scope>M0C</scope><scope>M0K</scope><scope>M0N</scope><scope>M1Q</scope><scope>M2O</scope><scope>M2P</scope><scope>M2T</scope><scope>M7P</scope><scope>M7S</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>Q9U</scope><scope>7SC</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>H8D</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20130101</creationdate><title>Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines</title><author>Ghysels, P ; Ashby, T J ; Meerbergen, K ; Vanroose, W</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-8ea0e95eedb6c14449a6f2a0c7d6b3bdbf6fdd08658aece32cc049a543896a123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Communication</topic><topic>Computation</topic><topic>Computer science</topic><topic>Eigenvalues</topic><topic>Iterative methods</topic><topic>Mathematical analysis</topic><topic>Mathematical models</topic><topic>Methods</topic><topic>Noise</topic><topic>Reduction</topic><topic>Synchronism</topic><topic>Synchronization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ghysels, P</creatorcontrib><creatorcontrib>Ashby, T J</creatorcontrib><creatorcontrib>Meerbergen, K</creatorcontrib><creatorcontrib>Vanroose, W</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>Agricultural Science Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Military Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>Telecommunications (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>Materials Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>ABI/INFORM Global</collection><collection>Agricultural Science Database</collection><collection>Computing Database</collection><collection>Military Database</collection><collection>Research Library</collection><collection>Science Database</collection><collection>Telecommunications Database</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>ProQuest Central Basic</collection><collection>Computer and Information Systems Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>SIAM journal on scientific computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ghysels, P</au><au>Ashby, T J</au><au>Meerbergen, K</au><au>Vanroose, W</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines</atitle><jtitle>SIAM journal on scientific computing</jtitle><date>2013-01-01</date><risdate>2013</risdate><volume>35</volume><issue>1</issue><spage>C48</spage><epage>C71</epage><pages>C48-C71</pages><issn>1064-8275</issn><eissn>1095-7197</eissn><abstract>In the generalized minimal residual method (GMRES), the global all-to-all communication required in each iteration for orthogonalization and normalization of the Krylov base vectors is becoming a performance bottleneck on massively parallel machines. Long latencies, system noise, and load imbalance cause these global reductions to become very costly global synchronizations. In this work, we propose the use of nonblocking or asynchronous global reductions to hide these global communication latencies by overlapping them with other communications and calculations. A pipelined variation of GMRES is presented in which the result of a global reduction is used only one or more iterations after the communication phase has started. This way, global synchronization is relaxed and scalability is much improved at the expense of some extra computations. The numerical instabilities that inevitably arise due to the typical monomial basis by powering the matrix are reduced and often annihilated by using Newton or Chebyshev bases instead. Our parallel experiments on a medium-sized cluster show significant speedups of the pipelined solvers compared to standard GMRES. An analytical model is used to extrapolate the performance to future exascale systems. [PUBLICATION ABSTRACT]</abstract><cop>Philadelphia</cop><pub>Society for Industrial and Applied Mathematics</pub><doi>10.1137/12086563X</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1064-8275
ispartof	SIAM journal on scientific computing, 2013-01, Vol.35 (1), p.C48-C71
issn	1064-8275 1095-7197
language	eng
recordid	cdi_proquest_miscellaneous_1315645991
source	SIAM Journals Online
subjects	Algorithms Communication Computation Computer science Eigenvalues Iterative methods Mathematical analysis Mathematical models Methods Noise Reduction Synchronism Synchronization
title	Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T11%3A56%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hiding%20Global%20Communication%20Latency%20in%20the%20GMRES%20Algorithm%20on%20Massively%20Parallel%20Machines&rft.jtitle=SIAM%20journal%20on%20scientific%20computing&rft.au=Ghysels,%20P&rft.date=2013-01-01&rft.volume=35&rft.issue=1&rft.spage=C48&rft.epage=C71&rft.pages=C48-C71&rft.issn=1064-8275&rft.eissn=1095-7197&rft_id=info:doi/10.1137/12086563X&rft_dat=%3Cproquest_cross%3E1315645991%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1267127384&rft_id=info:pmid/&rfr_iscdi=true