Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algori...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:SIAM journal on scientific computing 2016-01, Vol.38 (6), p.C624-C651
Hauptverfasser: Azad, Ariful, Ballard, Grey, Buluç, Aydin, Demmel, James, Grigori, Laura, Schwartz, Oded, Toledo, Sivan, Williams, Samuel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page C651
container_issue 6
container_start_page C624
container_title SIAM journal on scientific computing
container_volume 38
creator Azad, Ariful
Ballard, Grey
Buluç, Aydin
Demmel, James
Grigori, Laura
Schwartz, Oded
Toledo, Sivan
Williams, Samuel
description Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.Read More: epubs.siam.org/doi/10.1137/15M104253X
doi_str_mv 10.1137/15M104253X
format Article
fullrecord <record><control><sourceid>hal_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1378775</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_HAL_hal_01426294v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-c292t-5577dbab1091d35e7f6b09d1848e605f0745e879adce1f22b74cfb1931e303853</originalsourceid><addsrcrecordid>eNpFkF1LwzAUhoMoOKc3_oLinUI1Jx9NcznGdEKHAxW8C2maukjWliaO-e9dqR9X7-HwvIfDg9Al4FsAKu6ArwAzwunbEZoAljwVIMXxMGcszYngp-gshA-MIWOSTNB6se9866Jr3pPVp4-u8zYp7M76kLR1sta99t56F7aJa5LnTvfBJisde7dPx_itOaOja5tzdFJrH-zFT07R6_3iZb5Mi6eHx_msSA2RJKacC1GVujz8CBXlVtRZiWUFOctthnmNBeM2F1JXxkJNSCmYqUuQFCzFNOd0iq7Gu22ITgXjojUb0zaNNVEdVORCDND1CG20V13vtrr_Uq12ajkr1LDDwEhGJNvBgb0ZWdO3IfS2_isAVoNc9S-XfgPz4Gr2</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication</title><source>SIAM Journals Online</source><creator>Azad, Ariful ; Ballard, Grey ; Buluç, Aydin ; Demmel, James ; Grigori, Laura ; Schwartz, Oded ; Toledo, Sivan ; Williams, Samuel</creator><creatorcontrib>Azad, Ariful ; Ballard, Grey ; Buluç, Aydin ; Demmel, James ; Grigori, Laura ; Schwartz, Oded ; Toledo, Sivan ; Williams, Samuel ; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><description>Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.Read More: epubs.siam.org/doi/10.1137/15M104253X</description><identifier>ISSN: 1064-8275</identifier><identifier>EISSN: 1095-7197</identifier><identifier>DOI: 10.1137/15M104253X</identifier><language>eng</language><publisher>United States: Society for Industrial and Applied Mathematics</publisher><subject>2.5D algorithms ; 2D decomposition ; 3D algorithms ; Computer Science ; graph algorithms ; MATHEMATICS AND COMPUTING ; multithreading ; numerical linear algebra ; parallel computing ; sparse matrix-matrix multiplication ; SpGEMM</subject><ispartof>SIAM journal on scientific computing, 2016-01, Vol.38 (6), p.C624-C651</ispartof><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c292t-5577dbab1091d35e7f6b09d1848e605f0745e879adce1f22b74cfb1931e303853</citedby><cites>FETCH-LOGICAL-c292t-5577dbab1091d35e7f6b09d1848e605f0745e879adce1f22b74cfb1931e303853</cites><orcidid>0000-0002-5880-1076</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,3184,27924,27925</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-01426294$$DView record in HAL$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/servlets/purl/1378775$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Azad, Ariful</creatorcontrib><creatorcontrib>Ballard, Grey</creatorcontrib><creatorcontrib>Buluç, Aydin</creatorcontrib><creatorcontrib>Demmel, James</creatorcontrib><creatorcontrib>Grigori, Laura</creatorcontrib><creatorcontrib>Schwartz, Oded</creatorcontrib><creatorcontrib>Toledo, Sivan</creatorcontrib><creatorcontrib>Williams, Samuel</creatorcontrib><creatorcontrib>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><title>Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication</title><title>SIAM journal on scientific computing</title><description>Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.Read More: epubs.siam.org/doi/10.1137/15M104253X</description><subject>2.5D algorithms</subject><subject>2D decomposition</subject><subject>3D algorithms</subject><subject>Computer Science</subject><subject>graph algorithms</subject><subject>MATHEMATICS AND COMPUTING</subject><subject>multithreading</subject><subject>numerical linear algebra</subject><subject>parallel computing</subject><subject>sparse matrix-matrix multiplication</subject><subject>SpGEMM</subject><issn>1064-8275</issn><issn>1095-7197</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNpFkF1LwzAUhoMoOKc3_oLinUI1Jx9NcznGdEKHAxW8C2maukjWliaO-e9dqR9X7-HwvIfDg9Al4FsAKu6ArwAzwunbEZoAljwVIMXxMGcszYngp-gshA-MIWOSTNB6se9866Jr3pPVp4-u8zYp7M76kLR1sta99t56F7aJa5LnTvfBJisde7dPx_itOaOja5tzdFJrH-zFT07R6_3iZb5Mi6eHx_msSA2RJKacC1GVujz8CBXlVtRZiWUFOctthnmNBeM2F1JXxkJNSCmYqUuQFCzFNOd0iq7Gu22ITgXjojUb0zaNNVEdVORCDND1CG20V13vtrr_Uq12ajkr1LDDwEhGJNvBgb0ZWdO3IfS2_isAVoNc9S-XfgPz4Gr2</recordid><startdate>20160101</startdate><enddate>20160101</enddate><creator>Azad, Ariful</creator><creator>Ballard, Grey</creator><creator>Buluç, Aydin</creator><creator>Demmel, James</creator><creator>Grigori, Laura</creator><creator>Schwartz, Oded</creator><creator>Toledo, Sivan</creator><creator>Williams, Samuel</creator><general>Society for Industrial and Applied Mathematics</general><general>SIAM</general><scope>AAYXX</scope><scope>CITATION</scope><scope>1XC</scope><scope>OIOZB</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000-0002-5880-1076</orcidid></search><sort><creationdate>20160101</creationdate><title>Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication</title><author>Azad, Ariful ; Ballard, Grey ; Buluç, Aydin ; Demmel, James ; Grigori, Laura ; Schwartz, Oded ; Toledo, Sivan ; Williams, Samuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c292t-5577dbab1091d35e7f6b09d1848e605f0745e879adce1f22b74cfb1931e303853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>2.5D algorithms</topic><topic>2D decomposition</topic><topic>3D algorithms</topic><topic>Computer Science</topic><topic>graph algorithms</topic><topic>MATHEMATICS AND COMPUTING</topic><topic>multithreading</topic><topic>numerical linear algebra</topic><topic>parallel computing</topic><topic>sparse matrix-matrix multiplication</topic><topic>SpGEMM</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Azad, Ariful</creatorcontrib><creatorcontrib>Ballard, Grey</creatorcontrib><creatorcontrib>Buluç, Aydin</creatorcontrib><creatorcontrib>Demmel, James</creatorcontrib><creatorcontrib>Grigori, Laura</creatorcontrib><creatorcontrib>Schwartz, Oded</creatorcontrib><creatorcontrib>Toledo, Sivan</creatorcontrib><creatorcontrib>Williams, Samuel</creatorcontrib><creatorcontrib>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><collection>CrossRef</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>SIAM journal on scientific computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Azad, Ariful</au><au>Ballard, Grey</au><au>Buluç, Aydin</au><au>Demmel, James</au><au>Grigori, Laura</au><au>Schwartz, Oded</au><au>Toledo, Sivan</au><au>Williams, Samuel</au><aucorp>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication</atitle><jtitle>SIAM journal on scientific computing</jtitle><date>2016-01-01</date><risdate>2016</risdate><volume>38</volume><issue>6</issue><spage>C624</spage><epage>C651</epage><pages>C624-C651</pages><issn>1064-8275</issn><eissn>1095-7197</eissn><abstract>Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.Read More: epubs.siam.org/doi/10.1137/15M104253X</abstract><cop>United States</cop><pub>Society for Industrial and Applied Mathematics</pub><doi>10.1137/15M104253X</doi><orcidid>https://orcid.org/0000-0002-5880-1076</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1064-8275
ispartof SIAM journal on scientific computing, 2016-01, Vol.38 (6), p.C624-C651
issn 1064-8275
1095-7197
language eng
recordid cdi_osti_scitechconnect_1378775
source SIAM Journals Online
subjects 2.5D algorithms
2D decomposition
3D algorithms
Computer Science
graph algorithms
MATHEMATICS AND COMPUTING
multithreading
numerical linear algebra
parallel computing
sparse matrix-matrix multiplication
SpGEMM
title Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T19%3A13%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploiting%20Multiple%20Levels%20of%20Parallelism%20in%20Sparse%20Matrix-Matrix%20Multiplication&rft.jtitle=SIAM%20journal%20on%20scientific%20computing&rft.au=Azad,%20Ariful&rft.aucorp=Lawrence%20Berkeley%20National%20Lab.%20(LBNL),%20Berkeley,%20CA%20(United%20States)&rft.date=2016-01-01&rft.volume=38&rft.issue=6&rft.spage=C624&rft.epage=C651&rft.pages=C624-C651&rft.issn=1064-8275&rft.eissn=1095-7197&rft_id=info:doi/10.1137/15M104253X&rft_dat=%3Chal_osti_%3Eoai_HAL_hal_01426294v1%3C/hal_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true