Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algori...
Gespeichert in:
Veröffentlicht in: | SIAM journal on scientific computing 2016-01, Vol.38 (6), p.C624-C651 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | C651 |
---|---|
container_issue | 6 |
container_start_page | C624 |
container_title | SIAM journal on scientific computing |
container_volume | 38 |
creator | Azad, Ariful Ballard, Grey Buluç, Aydin Demmel, James Grigori, Laura Schwartz, Oded Toledo, Sivan Williams, Samuel |
description | Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.Read More: epubs.siam.org/doi/10.1137/15M104253X |
doi_str_mv | 10.1137/15M104253X |
format | Article |
fullrecord | <record><control><sourceid>hal_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1378775</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_HAL_hal_01426294v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-c292t-5577dbab1091d35e7f6b09d1848e605f0745e879adce1f22b74cfb1931e303853</originalsourceid><addsrcrecordid>eNpFkF1LwzAUhoMoOKc3_oLinUI1Jx9NcznGdEKHAxW8C2maukjWliaO-e9dqR9X7-HwvIfDg9Al4FsAKu6ArwAzwunbEZoAljwVIMXxMGcszYngp-gshA-MIWOSTNB6se9866Jr3pPVp4-u8zYp7M76kLR1sta99t56F7aJa5LnTvfBJisde7dPx_itOaOja5tzdFJrH-zFT07R6_3iZb5Mi6eHx_msSA2RJKacC1GVujz8CBXlVtRZiWUFOctthnmNBeM2F1JXxkJNSCmYqUuQFCzFNOd0iq7Gu22ITgXjojUb0zaNNVEdVORCDND1CG20V13vtrr_Uq12ajkr1LDDwEhGJNvBgb0ZWdO3IfS2_isAVoNc9S-XfgPz4Gr2</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication</title><source>SIAM Journals Online</source><creator>Azad, Ariful ; Ballard, Grey ; Buluç, Aydin ; Demmel, James ; Grigori, Laura ; Schwartz, Oded ; Toledo, Sivan ; Williams, Samuel</creator><creatorcontrib>Azad, Ariful ; Ballard, Grey ; Buluç, Aydin ; Demmel, James ; Grigori, Laura ; Schwartz, Oded ; Toledo, Sivan ; Williams, Samuel ; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><description>Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.Read More: epubs.siam.org/doi/10.1137/15M104253X</description><identifier>ISSN: 1064-8275</identifier><identifier>EISSN: 1095-7197</identifier><identifier>DOI: 10.1137/15M104253X</identifier><language>eng</language><publisher>United States: Society for Industrial and Applied Mathematics</publisher><subject>2.5D algorithms ; 2D decomposition ; 3D algorithms ; Computer Science ; graph algorithms ; MATHEMATICS AND COMPUTING ; multithreading ; numerical linear algebra ; parallel computing ; sparse matrix-matrix multiplication ; SpGEMM</subject><ispartof>SIAM journal on scientific computing, 2016-01, Vol.38 (6), p.C624-C651</ispartof><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c292t-5577dbab1091d35e7f6b09d1848e605f0745e879adce1f22b74cfb1931e303853</citedby><cites>FETCH-LOGICAL-c292t-5577dbab1091d35e7f6b09d1848e605f0745e879adce1f22b74cfb1931e303853</cites><orcidid>0000-0002-5880-1076</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,3184,27924,27925</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-01426294$$DView record in HAL$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/servlets/purl/1378775$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Azad, Ariful</creatorcontrib><creatorcontrib>Ballard, Grey</creatorcontrib><creatorcontrib>Buluç, Aydin</creatorcontrib><creatorcontrib>Demmel, James</creatorcontrib><creatorcontrib>Grigori, Laura</creatorcontrib><creatorcontrib>Schwartz, Oded</creatorcontrib><creatorcontrib>Toledo, Sivan</creatorcontrib><creatorcontrib>Williams, Samuel</creatorcontrib><creatorcontrib>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><title>Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication</title><title>SIAM journal on scientific computing</title><description>Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.Read More: epubs.siam.org/doi/10.1137/15M104253X</description><subject>2.5D algorithms</subject><subject>2D decomposition</subject><subject>3D algorithms</subject><subject>Computer Science</subject><subject>graph algorithms</subject><subject>MATHEMATICS AND COMPUTING</subject><subject>multithreading</subject><subject>numerical linear algebra</subject><subject>parallel computing</subject><subject>sparse matrix-matrix multiplication</subject><subject>SpGEMM</subject><issn>1064-8275</issn><issn>1095-7197</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNpFkF1LwzAUhoMoOKc3_oLinUI1Jx9NcznGdEKHAxW8C2maukjWliaO-e9dqR9X7-HwvIfDg9Al4FsAKu6ArwAzwunbEZoAljwVIMXxMGcszYngp-gshA-MIWOSTNB6se9866Jr3pPVp4-u8zYp7M76kLR1sta99t56F7aJa5LnTvfBJisde7dPx_itOaOja5tzdFJrH-zFT07R6_3iZb5Mi6eHx_msSA2RJKacC1GVujz8CBXlVtRZiWUFOctthnmNBeM2F1JXxkJNSCmYqUuQFCzFNOd0iq7Gu22ITgXjojUb0zaNNVEdVORCDND1CG20V13vtrr_Uq12ajkr1LDDwEhGJNvBgb0ZWdO3IfS2_isAVoNc9S-XfgPz4Gr2</recordid><startdate>20160101</startdate><enddate>20160101</enddate><creator>Azad, Ariful</creator><creator>Ballard, Grey</creator><creator>Buluç, Aydin</creator><creator>Demmel, James</creator><creator>Grigori, Laura</creator><creator>Schwartz, Oded</creator><creator>Toledo, Sivan</creator><creator>Williams, Samuel</creator><general>Society for Industrial and Applied Mathematics</general><general>SIAM</general><scope>AAYXX</scope><scope>CITATION</scope><scope>1XC</scope><scope>OIOZB</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000-0002-5880-1076</orcidid></search><sort><creationdate>20160101</creationdate><title>Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication</title><author>Azad, Ariful ; Ballard, Grey ; Buluç, Aydin ; Demmel, James ; Grigori, Laura ; Schwartz, Oded ; Toledo, Sivan ; Williams, Samuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c292t-5577dbab1091d35e7f6b09d1848e605f0745e879adce1f22b74cfb1931e303853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>2.5D algorithms</topic><topic>2D decomposition</topic><topic>3D algorithms</topic><topic>Computer Science</topic><topic>graph algorithms</topic><topic>MATHEMATICS AND COMPUTING</topic><topic>multithreading</topic><topic>numerical linear algebra</topic><topic>parallel computing</topic><topic>sparse matrix-matrix multiplication</topic><topic>SpGEMM</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Azad, Ariful</creatorcontrib><creatorcontrib>Ballard, Grey</creatorcontrib><creatorcontrib>Buluç, Aydin</creatorcontrib><creatorcontrib>Demmel, James</creatorcontrib><creatorcontrib>Grigori, Laura</creatorcontrib><creatorcontrib>Schwartz, Oded</creatorcontrib><creatorcontrib>Toledo, Sivan</creatorcontrib><creatorcontrib>Williams, Samuel</creatorcontrib><creatorcontrib>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><collection>CrossRef</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>SIAM journal on scientific computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Azad, Ariful</au><au>Ballard, Grey</au><au>Buluç, Aydin</au><au>Demmel, James</au><au>Grigori, Laura</au><au>Schwartz, Oded</au><au>Toledo, Sivan</au><au>Williams, Samuel</au><aucorp>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication</atitle><jtitle>SIAM journal on scientific computing</jtitle><date>2016-01-01</date><risdate>2016</risdate><volume>38</volume><issue>6</issue><spage>C624</spage><epage>C651</epage><pages>C624-C651</pages><issn>1064-8275</issn><eissn>1095-7197</eissn><abstract>Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.Read More: epubs.siam.org/doi/10.1137/15M104253X</abstract><cop>United States</cop><pub>Society for Industrial and Applied Mathematics</pub><doi>10.1137/15M104253X</doi><orcidid>https://orcid.org/0000-0002-5880-1076</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1064-8275 |
ispartof | SIAM journal on scientific computing, 2016-01, Vol.38 (6), p.C624-C651 |
issn | 1064-8275 1095-7197 |
language | eng |
recordid | cdi_osti_scitechconnect_1378775 |
source | SIAM Journals Online |
subjects | 2.5D algorithms 2D decomposition 3D algorithms Computer Science graph algorithms MATHEMATICS AND COMPUTING multithreading numerical linear algebra parallel computing sparse matrix-matrix multiplication SpGEMM |
title | Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T19%3A13%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploiting%20Multiple%20Levels%20of%20Parallelism%20in%20Sparse%20Matrix-Matrix%20Multiplication&rft.jtitle=SIAM%20journal%20on%20scientific%20computing&rft.au=Azad,%20Ariful&rft.aucorp=Lawrence%20Berkeley%20National%20Lab.%20(LBNL),%20Berkeley,%20CA%20(United%20States)&rft.date=2016-01-01&rft.volume=38&rft.issue=6&rft.spage=C624&rft.epage=C651&rft.pages=C624-C651&rft.issn=1064-8275&rft.eissn=1095-7197&rft_id=info:doi/10.1137/15M104253X&rft_dat=%3Chal_osti_%3Eoai_HAL_hal_01426294v1%3C/hal_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |