Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer

Plasma turbulence research based on five-dimensional (5D) gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in timescales, an improvement of strong scaling is essential. Overlap of computations and com...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The international journal of high performance computing applications 2014-02, Vol.28 (1), p.73-86
Hauptverfasser: Idomura, Yasuhiro, Nakata, Motoki, Yamada, Susumu, Machida, Masahiko, Imamura, Toshiyuki, Watanabe, Tomohiko, Nunami, Masanori, Inoue, Hikaru, Tsutsumi, Shigenobu, Miyoshi, Ikuo, Shida, Naoyuki
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 86
container_issue 1
container_start_page 73
container_title The international journal of high performance computing applications
container_volume 28
creator Idomura, Yasuhiro
Nakata, Motoki
Yamada, Susumu
Machida, Masahiko
Imamura, Toshiyuki
Watanabe, Tomohiko
Nunami, Masanori
Inoue, Hikaru
Tsutsumi, Shigenobu
Miyoshi, Ikuo
Shida, Naoyuki
description Plasma turbulence research based on five-dimensional (5D) gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in timescales, an improvement of strong scaling is essential. Overlap of computations and communications using non-blocking MPI communication schemes is a promising approach to improving strong scaling, but it often fails on practical applications with conventional MPI libraries. In this work, this classical issue is resolved by developing communication-overlap techniques with additional MPI support for non-blocking communication routines and with heterogeneous OpenMP threads, which work even on conventional MPI libraries and network hardware. These techniques dramatically improved the parallel efficiency of a gyrokinetic toroidal 5D Eulerian code GT5D on the K-computer, which has a dedicated network, and on the Helios system which has a commodity network. On the K-computer, excellent strong scaling was achieved beyond 100k cores whilst keeping a sustained performance of ~ 10% ( ~ 307 TFlops using 196,608 cores), and simulations for next-generation large-scale fusion experiments are significantly accelerated. This performance is 16 × sped up compared with the maximum performance reported at the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis ( ~ 19 TFlops using 16,384 cores of the BX900 cluster) (Idomura, 2011).
doi_str_mv 10.1177/1094342013490973
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1492270687</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_1094342013490973</sage_id><sourcerecordid>3193703921</sourcerecordid><originalsourceid>FETCH-LOGICAL-c405t-bc1473d9684171cd533e34e28bbd5df66d0bcfa178a7f3c885a9ea873aed81343</originalsourceid><addsrcrecordid>eNp1UE1LxDAQDaLgWr17DIjHaNKkTXqUZf3ABS96LmmS7ma3TdakFfbgfzdlRUTwNMO8N-_NPAAuCb4hhPNbgitGWY4JZRWuOD0CM8IZQblg5XHqE4wm_BScxbjBGJeMFjPwOfd9Pzqr5GC9Q_7DhE7u4GDU2tn30UTY-gBtvwsJ0jAOwbsVjEp2NlXfwtU--K11ZrAKLsbOBCsdVF4b2Ji9dxoSjLdpEJKUd3BYG_iMlO9342DCOThpZRfNxXfNwNv94nX-iJYvD0_zuyVSDBcDahRhnOqqFIxwonRBqaHM5KJpdKHbstS4Ua0kXEjeUiVEISsjBafSaJECoRm4OuimN6anhnrjx-CSZU1Yleccl4mdAXxgqeBjDKatd8H2Muxrgusp5PpvyGnl-ltYTpm0QTpl489eLkhV8HRCBtCBF-XK_DL_T_cLygmKXg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1492270687</pqid></control><display><type>article</type><title>Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer</title><source>SAGE Publications</source><source>Alma/SFX Local Collection</source><creator>Idomura, Yasuhiro ; Nakata, Motoki ; Yamada, Susumu ; Machida, Masahiko ; Imamura, Toshiyuki ; Watanabe, Tomohiko ; Nunami, Masanori ; Inoue, Hikaru ; Tsutsumi, Shigenobu ; Miyoshi, Ikuo ; Shida, Naoyuki</creator><creatorcontrib>Idomura, Yasuhiro ; Nakata, Motoki ; Yamada, Susumu ; Machida, Masahiko ; Imamura, Toshiyuki ; Watanabe, Tomohiko ; Nunami, Masanori ; Inoue, Hikaru ; Tsutsumi, Shigenobu ; Miyoshi, Ikuo ; Shida, Naoyuki</creatorcontrib><description>Plasma turbulence research based on five-dimensional (5D) gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in timescales, an improvement of strong scaling is essential. Overlap of computations and communications using non-blocking MPI communication schemes is a promising approach to improving strong scaling, but it often fails on practical applications with conventional MPI libraries. In this work, this classical issue is resolved by developing communication-overlap techniques with additional MPI support for non-blocking communication routines and with heterogeneous OpenMP threads, which work even on conventional MPI libraries and network hardware. These techniques dramatically improved the parallel efficiency of a gyrokinetic toroidal 5D Eulerian code GT5D on the K-computer, which has a dedicated network, and on the Helios system which has a commodity network. On the K-computer, excellent strong scaling was achieved beyond 100k cores whilst keeping a sustained performance of ~ 10% ( ~ 307 TFlops using 196,608 cores), and simulations for next-generation large-scale fusion experiments are significantly accelerated. This performance is 16 × sped up compared with the maximum performance reported at the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis ( ~ 19 TFlops using 16,384 cores of the BX900 cluster) (Idomura, 2011).</description><identifier>ISSN: 1094-3420</identifier><identifier>EISSN: 1741-2846</identifier><identifier>DOI: 10.1177/1094342013490973</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Applied sciences ; Communication ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; Eulers equations ; Exact sciences and technology ; Fusion ; Gyrokinetics ; High performance computing ; Libraries ; Physics ; Physics of gases, plasmas and electric discharges ; Physics of plasmas and electric discharges ; Plasma dynamics and flow ; Plasma physics ; Simulation ; Software ; Studies</subject><ispartof>The international journal of high performance computing applications, 2014-02, Vol.28 (1), p.73-86</ispartof><rights>The Author(s) 2012</rights><rights>2015 INIST-CNRS</rights><rights>Copyright SAGE PUBLICATIONS, INC. Feb 2014</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c405t-bc1473d9684171cd533e34e28bbd5df66d0bcfa178a7f3c885a9ea873aed81343</citedby><cites>FETCH-LOGICAL-c405t-bc1473d9684171cd533e34e28bbd5df66d0bcfa178a7f3c885a9ea873aed81343</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/1094342013490973$$EPDF$$P50$$Gsage$$H</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/1094342013490973$$EHTML$$P50$$Gsage$$H</linktohtml><link.rule.ids>314,780,784,21818,27923,27924,43620,43621</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=28195781$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Idomura, Yasuhiro</creatorcontrib><creatorcontrib>Nakata, Motoki</creatorcontrib><creatorcontrib>Yamada, Susumu</creatorcontrib><creatorcontrib>Machida, Masahiko</creatorcontrib><creatorcontrib>Imamura, Toshiyuki</creatorcontrib><creatorcontrib>Watanabe, Tomohiko</creatorcontrib><creatorcontrib>Nunami, Masanori</creatorcontrib><creatorcontrib>Inoue, Hikaru</creatorcontrib><creatorcontrib>Tsutsumi, Shigenobu</creatorcontrib><creatorcontrib>Miyoshi, Ikuo</creatorcontrib><creatorcontrib>Shida, Naoyuki</creatorcontrib><title>Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer</title><title>The international journal of high performance computing applications</title><description>Plasma turbulence research based on five-dimensional (5D) gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in timescales, an improvement of strong scaling is essential. Overlap of computations and communications using non-blocking MPI communication schemes is a promising approach to improving strong scaling, but it often fails on practical applications with conventional MPI libraries. In this work, this classical issue is resolved by developing communication-overlap techniques with additional MPI support for non-blocking communication routines and with heterogeneous OpenMP threads, which work even on conventional MPI libraries and network hardware. These techniques dramatically improved the parallel efficiency of a gyrokinetic toroidal 5D Eulerian code GT5D on the K-computer, which has a dedicated network, and on the Helios system which has a commodity network. On the K-computer, excellent strong scaling was achieved beyond 100k cores whilst keeping a sustained performance of ~ 10% ( ~ 307 TFlops using 196,608 cores), and simulations for next-generation large-scale fusion experiments are significantly accelerated. This performance is 16 × sped up compared with the maximum performance reported at the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis ( ~ 19 TFlops using 16,384 cores of the BX900 cluster) (Idomura, 2011).</description><subject>Applied sciences</subject><subject>Communication</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>Eulers equations</subject><subject>Exact sciences and technology</subject><subject>Fusion</subject><subject>Gyrokinetics</subject><subject>High performance computing</subject><subject>Libraries</subject><subject>Physics</subject><subject>Physics of gases, plasmas and electric discharges</subject><subject>Physics of plasmas and electric discharges</subject><subject>Plasma dynamics and flow</subject><subject>Plasma physics</subject><subject>Simulation</subject><subject>Software</subject><subject>Studies</subject><issn>1094-3420</issn><issn>1741-2846</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp1UE1LxDAQDaLgWr17DIjHaNKkTXqUZf3ABS96LmmS7ma3TdakFfbgfzdlRUTwNMO8N-_NPAAuCb4hhPNbgitGWY4JZRWuOD0CM8IZQblg5XHqE4wm_BScxbjBGJeMFjPwOfd9Pzqr5GC9Q_7DhE7u4GDU2tn30UTY-gBtvwsJ0jAOwbsVjEp2NlXfwtU--K11ZrAKLsbOBCsdVF4b2Ji9dxoSjLdpEJKUd3BYG_iMlO9342DCOThpZRfNxXfNwNv94nX-iJYvD0_zuyVSDBcDahRhnOqqFIxwonRBqaHM5KJpdKHbstS4Ua0kXEjeUiVEISsjBafSaJECoRm4OuimN6anhnrjx-CSZU1Yleccl4mdAXxgqeBjDKatd8H2Muxrgusp5PpvyGnl-ltYTpm0QTpl489eLkhV8HRCBtCBF-XK_DL_T_cLygmKXg</recordid><startdate>20140201</startdate><enddate>20140201</enddate><creator>Idomura, Yasuhiro</creator><creator>Nakata, Motoki</creator><creator>Yamada, Susumu</creator><creator>Machida, Masahiko</creator><creator>Imamura, Toshiyuki</creator><creator>Watanabe, Tomohiko</creator><creator>Nunami, Masanori</creator><creator>Inoue, Hikaru</creator><creator>Tsutsumi, Shigenobu</creator><creator>Miyoshi, Ikuo</creator><creator>Shida, Naoyuki</creator><general>SAGE Publications</general><general>Sage Publications</general><general>SAGE PUBLICATIONS, INC</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20140201</creationdate><title>Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer</title><author>Idomura, Yasuhiro ; Nakata, Motoki ; Yamada, Susumu ; Machida, Masahiko ; Imamura, Toshiyuki ; Watanabe, Tomohiko ; Nunami, Masanori ; Inoue, Hikaru ; Tsutsumi, Shigenobu ; Miyoshi, Ikuo ; Shida, Naoyuki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c405t-bc1473d9684171cd533e34e28bbd5df66d0bcfa178a7f3c885a9ea873aed81343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Applied sciences</topic><topic>Communication</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>Eulers equations</topic><topic>Exact sciences and technology</topic><topic>Fusion</topic><topic>Gyrokinetics</topic><topic>High performance computing</topic><topic>Libraries</topic><topic>Physics</topic><topic>Physics of gases, plasmas and electric discharges</topic><topic>Physics of plasmas and electric discharges</topic><topic>Plasma dynamics and flow</topic><topic>Plasma physics</topic><topic>Simulation</topic><topic>Software</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Idomura, Yasuhiro</creatorcontrib><creatorcontrib>Nakata, Motoki</creatorcontrib><creatorcontrib>Yamada, Susumu</creatorcontrib><creatorcontrib>Machida, Masahiko</creatorcontrib><creatorcontrib>Imamura, Toshiyuki</creatorcontrib><creatorcontrib>Watanabe, Tomohiko</creatorcontrib><creatorcontrib>Nunami, Masanori</creatorcontrib><creatorcontrib>Inoue, Hikaru</creatorcontrib><creatorcontrib>Tsutsumi, Shigenobu</creatorcontrib><creatorcontrib>Miyoshi, Ikuo</creatorcontrib><creatorcontrib>Shida, Naoyuki</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>The international journal of high performance computing applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Idomura, Yasuhiro</au><au>Nakata, Motoki</au><au>Yamada, Susumu</au><au>Machida, Masahiko</au><au>Imamura, Toshiyuki</au><au>Watanabe, Tomohiko</au><au>Nunami, Masanori</au><au>Inoue, Hikaru</au><au>Tsutsumi, Shigenobu</au><au>Miyoshi, Ikuo</au><au>Shida, Naoyuki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer</atitle><jtitle>The international journal of high performance computing applications</jtitle><date>2014-02-01</date><risdate>2014</risdate><volume>28</volume><issue>1</issue><spage>73</spage><epage>86</epage><pages>73-86</pages><issn>1094-3420</issn><eissn>1741-2846</eissn><abstract>Plasma turbulence research based on five-dimensional (5D) gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in timescales, an improvement of strong scaling is essential. Overlap of computations and communications using non-blocking MPI communication schemes is a promising approach to improving strong scaling, but it often fails on practical applications with conventional MPI libraries. In this work, this classical issue is resolved by developing communication-overlap techniques with additional MPI support for non-blocking communication routines and with heterogeneous OpenMP threads, which work even on conventional MPI libraries and network hardware. These techniques dramatically improved the parallel efficiency of a gyrokinetic toroidal 5D Eulerian code GT5D on the K-computer, which has a dedicated network, and on the Helios system which has a commodity network. On the K-computer, excellent strong scaling was achieved beyond 100k cores whilst keeping a sustained performance of ~ 10% ( ~ 307 TFlops using 196,608 cores), and simulations for next-generation large-scale fusion experiments are significantly accelerated. This performance is 16 × sped up compared with the maximum performance reported at the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis ( ~ 19 TFlops using 16,384 cores of the BX900 cluster) (Idomura, 2011).</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.1177/1094342013490973</doi><tpages>14</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1094-3420
ispartof The international journal of high performance computing applications, 2014-02, Vol.28 (1), p.73-86
issn 1094-3420
1741-2846
language eng
recordid cdi_proquest_journals_1492270687
source SAGE Publications; Alma/SFX Local Collection
subjects Applied sciences
Communication
Computer science
control theory
systems
Computer systems and distributed systems. User interface
Eulers equations
Exact sciences and technology
Fusion
Gyrokinetics
High performance computing
Libraries
Physics
Physics of gases, plasmas and electric discharges
Physics of plasmas and electric discharges
Plasma dynamics and flow
Plasma physics
Simulation
Software
Studies
title Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T18%3A38%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Communication-overlap%20techniques%20for%20improved%20strong%20scaling%20of%20gyrokinetic%20Eulerian%20code%20beyond%20100k%20cores%20on%20the%20K-computer&rft.jtitle=The%20international%20journal%20of%20high%20performance%20computing%20applications&rft.au=Idomura,%20Yasuhiro&rft.date=2014-02-01&rft.volume=28&rft.issue=1&rft.spage=73&rft.epage=86&rft.pages=73-86&rft.issn=1094-3420&rft.eissn=1741-2846&rft_id=info:doi/10.1177/1094342013490973&rft_dat=%3Cproquest_cross%3E3193703921%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1492270687&rft_id=info:pmid/&rft_sage_id=10.1177_1094342013490973&rfr_iscdi=true