Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer
Plasma turbulence research based on five-dimensional (5D) gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in timescales, an improvement of strong scaling is essential. Overlap of computations and com...
Gespeichert in:
Veröffentlicht in: | The international journal of high performance computing applications 2014-02, Vol.28 (1), p.73-86 |
---|---|
Hauptverfasser: | , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 86 |
---|---|
container_issue | 1 |
container_start_page | 73 |
container_title | The international journal of high performance computing applications |
container_volume | 28 |
creator | Idomura, Yasuhiro Nakata, Motoki Yamada, Susumu Machida, Masahiko Imamura, Toshiyuki Watanabe, Tomohiko Nunami, Masanori Inoue, Hikaru Tsutsumi, Shigenobu Miyoshi, Ikuo Shida, Naoyuki |
description | Plasma turbulence research based on five-dimensional (5D) gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in timescales, an improvement of strong scaling is essential. Overlap of computations and communications using non-blocking MPI communication schemes is a promising approach to improving strong scaling, but it often fails on practical applications with conventional MPI libraries. In this work, this classical issue is resolved by developing communication-overlap techniques with additional MPI support for non-blocking communication routines and with heterogeneous OpenMP threads, which work even on conventional MPI libraries and network hardware. These techniques dramatically improved the parallel efficiency of a gyrokinetic toroidal 5D Eulerian code GT5D on the K-computer, which has a dedicated network, and on the Helios system which has a commodity network. On the K-computer, excellent strong scaling was achieved beyond 100k cores whilst keeping a sustained performance of
~
10% (
~
307 TFlops using 196,608 cores), and simulations for next-generation large-scale fusion experiments are significantly accelerated. This performance is 16
×
sped up compared with the maximum performance reported at the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (
~
19 TFlops using 16,384 cores of the BX900 cluster) (Idomura, 2011). |
doi_str_mv | 10.1177/1094342013490973 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1492270687</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_1094342013490973</sage_id><sourcerecordid>3193703921</sourcerecordid><originalsourceid>FETCH-LOGICAL-c405t-bc1473d9684171cd533e34e28bbd5df66d0bcfa178a7f3c885a9ea873aed81343</originalsourceid><addsrcrecordid>eNp1UE1LxDAQDaLgWr17DIjHaNKkTXqUZf3ABS96LmmS7ma3TdakFfbgfzdlRUTwNMO8N-_NPAAuCb4hhPNbgitGWY4JZRWuOD0CM8IZQblg5XHqE4wm_BScxbjBGJeMFjPwOfd9Pzqr5GC9Q_7DhE7u4GDU2tn30UTY-gBtvwsJ0jAOwbsVjEp2NlXfwtU--K11ZrAKLsbOBCsdVF4b2Ji9dxoSjLdpEJKUd3BYG_iMlO9342DCOThpZRfNxXfNwNv94nX-iJYvD0_zuyVSDBcDahRhnOqqFIxwonRBqaHM5KJpdKHbstS4Ua0kXEjeUiVEISsjBafSaJECoRm4OuimN6anhnrjx-CSZU1Yleccl4mdAXxgqeBjDKatd8H2Muxrgusp5PpvyGnl-ltYTpm0QTpl489eLkhV8HRCBtCBF-XK_DL_T_cLygmKXg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1492270687</pqid></control><display><type>article</type><title>Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer</title><source>SAGE Publications</source><source>Alma/SFX Local Collection</source><creator>Idomura, Yasuhiro ; Nakata, Motoki ; Yamada, Susumu ; Machida, Masahiko ; Imamura, Toshiyuki ; Watanabe, Tomohiko ; Nunami, Masanori ; Inoue, Hikaru ; Tsutsumi, Shigenobu ; Miyoshi, Ikuo ; Shida, Naoyuki</creator><creatorcontrib>Idomura, Yasuhiro ; Nakata, Motoki ; Yamada, Susumu ; Machida, Masahiko ; Imamura, Toshiyuki ; Watanabe, Tomohiko ; Nunami, Masanori ; Inoue, Hikaru ; Tsutsumi, Shigenobu ; Miyoshi, Ikuo ; Shida, Naoyuki</creatorcontrib><description>Plasma turbulence research based on five-dimensional (5D) gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in timescales, an improvement of strong scaling is essential. Overlap of computations and communications using non-blocking MPI communication schemes is a promising approach to improving strong scaling, but it often fails on practical applications with conventional MPI libraries. In this work, this classical issue is resolved by developing communication-overlap techniques with additional MPI support for non-blocking communication routines and with heterogeneous OpenMP threads, which work even on conventional MPI libraries and network hardware. These techniques dramatically improved the parallel efficiency of a gyrokinetic toroidal 5D Eulerian code GT5D on the K-computer, which has a dedicated network, and on the Helios system which has a commodity network. On the K-computer, excellent strong scaling was achieved beyond 100k cores whilst keeping a sustained performance of
~
10% (
~
307 TFlops using 196,608 cores), and simulations for next-generation large-scale fusion experiments are significantly accelerated. This performance is 16
×
sped up compared with the maximum performance reported at the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (
~
19 TFlops using 16,384 cores of the BX900 cluster) (Idomura, 2011).</description><identifier>ISSN: 1094-3420</identifier><identifier>EISSN: 1741-2846</identifier><identifier>DOI: 10.1177/1094342013490973</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Applied sciences ; Communication ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; Eulers equations ; Exact sciences and technology ; Fusion ; Gyrokinetics ; High performance computing ; Libraries ; Physics ; Physics of gases, plasmas and electric discharges ; Physics of plasmas and electric discharges ; Plasma dynamics and flow ; Plasma physics ; Simulation ; Software ; Studies</subject><ispartof>The international journal of high performance computing applications, 2014-02, Vol.28 (1), p.73-86</ispartof><rights>The Author(s) 2012</rights><rights>2015 INIST-CNRS</rights><rights>Copyright SAGE PUBLICATIONS, INC. Feb 2014</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c405t-bc1473d9684171cd533e34e28bbd5df66d0bcfa178a7f3c885a9ea873aed81343</citedby><cites>FETCH-LOGICAL-c405t-bc1473d9684171cd533e34e28bbd5df66d0bcfa178a7f3c885a9ea873aed81343</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/1094342013490973$$EPDF$$P50$$Gsage$$H</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/1094342013490973$$EHTML$$P50$$Gsage$$H</linktohtml><link.rule.ids>314,780,784,21818,27923,27924,43620,43621</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=28195781$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Idomura, Yasuhiro</creatorcontrib><creatorcontrib>Nakata, Motoki</creatorcontrib><creatorcontrib>Yamada, Susumu</creatorcontrib><creatorcontrib>Machida, Masahiko</creatorcontrib><creatorcontrib>Imamura, Toshiyuki</creatorcontrib><creatorcontrib>Watanabe, Tomohiko</creatorcontrib><creatorcontrib>Nunami, Masanori</creatorcontrib><creatorcontrib>Inoue, Hikaru</creatorcontrib><creatorcontrib>Tsutsumi, Shigenobu</creatorcontrib><creatorcontrib>Miyoshi, Ikuo</creatorcontrib><creatorcontrib>Shida, Naoyuki</creatorcontrib><title>Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer</title><title>The international journal of high performance computing applications</title><description>Plasma turbulence research based on five-dimensional (5D) gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in timescales, an improvement of strong scaling is essential. Overlap of computations and communications using non-blocking MPI communication schemes is a promising approach to improving strong scaling, but it often fails on practical applications with conventional MPI libraries. In this work, this classical issue is resolved by developing communication-overlap techniques with additional MPI support for non-blocking communication routines and with heterogeneous OpenMP threads, which work even on conventional MPI libraries and network hardware. These techniques dramatically improved the parallel efficiency of a gyrokinetic toroidal 5D Eulerian code GT5D on the K-computer, which has a dedicated network, and on the Helios system which has a commodity network. On the K-computer, excellent strong scaling was achieved beyond 100k cores whilst keeping a sustained performance of
~
10% (
~
307 TFlops using 196,608 cores), and simulations for next-generation large-scale fusion experiments are significantly accelerated. This performance is 16
×
sped up compared with the maximum performance reported at the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (
~
19 TFlops using 16,384 cores of the BX900 cluster) (Idomura, 2011).</description><subject>Applied sciences</subject><subject>Communication</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>Eulers equations</subject><subject>Exact sciences and technology</subject><subject>Fusion</subject><subject>Gyrokinetics</subject><subject>High performance computing</subject><subject>Libraries</subject><subject>Physics</subject><subject>Physics of gases, plasmas and electric discharges</subject><subject>Physics of plasmas and electric discharges</subject><subject>Plasma dynamics and flow</subject><subject>Plasma physics</subject><subject>Simulation</subject><subject>Software</subject><subject>Studies</subject><issn>1094-3420</issn><issn>1741-2846</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp1UE1LxDAQDaLgWr17DIjHaNKkTXqUZf3ABS96LmmS7ma3TdakFfbgfzdlRUTwNMO8N-_NPAAuCb4hhPNbgitGWY4JZRWuOD0CM8IZQblg5XHqE4wm_BScxbjBGJeMFjPwOfd9Pzqr5GC9Q_7DhE7u4GDU2tn30UTY-gBtvwsJ0jAOwbsVjEp2NlXfwtU--K11ZrAKLsbOBCsdVF4b2Ji9dxoSjLdpEJKUd3BYG_iMlO9342DCOThpZRfNxXfNwNv94nX-iJYvD0_zuyVSDBcDahRhnOqqFIxwonRBqaHM5KJpdKHbstS4Ua0kXEjeUiVEISsjBafSaJECoRm4OuimN6anhnrjx-CSZU1Yleccl4mdAXxgqeBjDKatd8H2Muxrgusp5PpvyGnl-ltYTpm0QTpl489eLkhV8HRCBtCBF-XK_DL_T_cLygmKXg</recordid><startdate>20140201</startdate><enddate>20140201</enddate><creator>Idomura, Yasuhiro</creator><creator>Nakata, Motoki</creator><creator>Yamada, Susumu</creator><creator>Machida, Masahiko</creator><creator>Imamura, Toshiyuki</creator><creator>Watanabe, Tomohiko</creator><creator>Nunami, Masanori</creator><creator>Inoue, Hikaru</creator><creator>Tsutsumi, Shigenobu</creator><creator>Miyoshi, Ikuo</creator><creator>Shida, Naoyuki</creator><general>SAGE Publications</general><general>Sage Publications</general><general>SAGE PUBLICATIONS, INC</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20140201</creationdate><title>Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer</title><author>Idomura, Yasuhiro ; Nakata, Motoki ; Yamada, Susumu ; Machida, Masahiko ; Imamura, Toshiyuki ; Watanabe, Tomohiko ; Nunami, Masanori ; Inoue, Hikaru ; Tsutsumi, Shigenobu ; Miyoshi, Ikuo ; Shida, Naoyuki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c405t-bc1473d9684171cd533e34e28bbd5df66d0bcfa178a7f3c885a9ea873aed81343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Applied sciences</topic><topic>Communication</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>Eulers equations</topic><topic>Exact sciences and technology</topic><topic>Fusion</topic><topic>Gyrokinetics</topic><topic>High performance computing</topic><topic>Libraries</topic><topic>Physics</topic><topic>Physics of gases, plasmas and electric discharges</topic><topic>Physics of plasmas and electric discharges</topic><topic>Plasma dynamics and flow</topic><topic>Plasma physics</topic><topic>Simulation</topic><topic>Software</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Idomura, Yasuhiro</creatorcontrib><creatorcontrib>Nakata, Motoki</creatorcontrib><creatorcontrib>Yamada, Susumu</creatorcontrib><creatorcontrib>Machida, Masahiko</creatorcontrib><creatorcontrib>Imamura, Toshiyuki</creatorcontrib><creatorcontrib>Watanabe, Tomohiko</creatorcontrib><creatorcontrib>Nunami, Masanori</creatorcontrib><creatorcontrib>Inoue, Hikaru</creatorcontrib><creatorcontrib>Tsutsumi, Shigenobu</creatorcontrib><creatorcontrib>Miyoshi, Ikuo</creatorcontrib><creatorcontrib>Shida, Naoyuki</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>The international journal of high performance computing applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Idomura, Yasuhiro</au><au>Nakata, Motoki</au><au>Yamada, Susumu</au><au>Machida, Masahiko</au><au>Imamura, Toshiyuki</au><au>Watanabe, Tomohiko</au><au>Nunami, Masanori</au><au>Inoue, Hikaru</au><au>Tsutsumi, Shigenobu</au><au>Miyoshi, Ikuo</au><au>Shida, Naoyuki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer</atitle><jtitle>The international journal of high performance computing applications</jtitle><date>2014-02-01</date><risdate>2014</risdate><volume>28</volume><issue>1</issue><spage>73</spage><epage>86</epage><pages>73-86</pages><issn>1094-3420</issn><eissn>1741-2846</eissn><abstract>Plasma turbulence research based on five-dimensional (5D) gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in timescales, an improvement of strong scaling is essential. Overlap of computations and communications using non-blocking MPI communication schemes is a promising approach to improving strong scaling, but it often fails on practical applications with conventional MPI libraries. In this work, this classical issue is resolved by developing communication-overlap techniques with additional MPI support for non-blocking communication routines and with heterogeneous OpenMP threads, which work even on conventional MPI libraries and network hardware. These techniques dramatically improved the parallel efficiency of a gyrokinetic toroidal 5D Eulerian code GT5D on the K-computer, which has a dedicated network, and on the Helios system which has a commodity network. On the K-computer, excellent strong scaling was achieved beyond 100k cores whilst keeping a sustained performance of
~
10% (
~
307 TFlops using 196,608 cores), and simulations for next-generation large-scale fusion experiments are significantly accelerated. This performance is 16
×
sped up compared with the maximum performance reported at the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (
~
19 TFlops using 16,384 cores of the BX900 cluster) (Idomura, 2011).</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.1177/1094342013490973</doi><tpages>14</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1094-3420 |
ispartof | The international journal of high performance computing applications, 2014-02, Vol.28 (1), p.73-86 |
issn | 1094-3420 1741-2846 |
language | eng |
recordid | cdi_proquest_journals_1492270687 |
source | SAGE Publications; Alma/SFX Local Collection |
subjects | Applied sciences Communication Computer science control theory systems Computer systems and distributed systems. User interface Eulers equations Exact sciences and technology Fusion Gyrokinetics High performance computing Libraries Physics Physics of gases, plasmas and electric discharges Physics of plasmas and electric discharges Plasma dynamics and flow Plasma physics Simulation Software Studies |
title | Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T18%3A38%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Communication-overlap%20techniques%20for%20improved%20strong%20scaling%20of%20gyrokinetic%20Eulerian%20code%20beyond%20100k%20cores%20on%20the%20K-computer&rft.jtitle=The%20international%20journal%20of%20high%20performance%20computing%20applications&rft.au=Idomura,%20Yasuhiro&rft.date=2014-02-01&rft.volume=28&rft.issue=1&rft.spage=73&rft.epage=86&rft.pages=73-86&rft.issn=1094-3420&rft.eissn=1741-2846&rft_id=info:doi/10.1177/1094342013490973&rft_dat=%3Cproquest_cross%3E3193703921%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1492270687&rft_id=info:pmid/&rft_sage_id=10.1177_1094342013490973&rfr_iscdi=true |