Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algori...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SIAM journal on scientific computing 2016-01, Vol.38 (6), p.C624-C651
Hauptverfasser:	Azad, Ariful, Ballard, Grey, Buluç, Aydin, Demmel, James, Grigori, Laura, Schwartz, Oded, Toledo, Sivan, Williams, Samuel
Format:	Artikel
Sprache:	eng
Schlagworte:	2.5D algorithms 2D decomposition 3D algorithms Computer Science graph algorithms MATHEMATICS AND COMPUTING multithreading numerical linear algebra parallel computing sparse matrix-matrix multiplication SpGEMM
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	C651
container_issue	6
container_start_page	C624
container_title	SIAM journal on scientific computing
container_volume	38
creator	Azad, Ariful Ballard, Grey Buluç, Aydin Demmel, James Grigori, Laura Schwartz, Oded Toledo, Sivan Williams, Samuel
description	Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.Read More: epubs.siam.org/doi/10.1137/15M104253X
doi_str_mv	10.1137/15M104253X
format	Article
fullrecord	<record><control><sourceid>hal_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1378775</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_HAL_hal_01426294v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-c292t-5577dbab1091d35e7f6b09d1848e605f0745e879adce1f22b74cfb1931e303853</originalsourceid><addsrcrecordid>eNpFkF1LwzAUhoMoOKc3_oLinUI1Jx9NcznGdEKHAxW8C2maukjWliaO-e9dqR9X7-HwvIfDg9Al4FsAKu6ArwAzwunbEZoAljwVIMXxMGcszYngp-gshA-MIWOSTNB6se9866Jr3pPVp4-u8zYp7M76kLR1sta99t56F7aJa5LnTvfBJisde7dPx_itOaOja5tzdFJrH-zFT07R6_3iZb5Mi6eHx_msSA2RJKacC1GVujz8CBXlVtRZiWUFOctthnmNBeM2F1JXxkJNSCmYqUuQFCzFNOd0iq7Gu22ITgXjojUb0zaNNVEdVORCDND1CG20V13vtrr_Uq12ajkr1LDDwEhGJNvBgb0ZWdO3IfS2_isAVoNc9S-XfgPz4Gr2</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication</title><source>SIAM Journals Online</source><creator>Azad, Ariful ; Ballard, Grey ; Buluç, Aydin ; Demmel, James ; Grigori, Laura ; Schwartz, Oded ; Toledo, Sivan ; Williams, Samuel</creator><creatorcontrib>Azad, Ariful ; Ballard, Grey ; Buluç, Aydin ; Demmel, James ; Grigori, Laura ; Schwartz, Oded ; Toledo, Sivan ; Williams, Samuel ; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><description>Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.Read More: epubs.siam.org/doi/10.1137/15M104253X</description><identifier>ISSN: 1064-8275</identifier><identifier>EISSN: 1095-7197</identifier><identifier>DOI: 10.1137/15M104253X</identifier><language>eng</language><publisher>United States: Society for Industrial and Applied Mathematics</publisher><subject>2.5D algorithms ; 2D decomposition ; 3D algorithms ; Computer Science ; graph algorithms ; MATHEMATICS AND COMPUTING ; multithreading ; numerical linear algebra ; parallel computing ; sparse matrix-matrix multiplication ; SpGEMM</subject><ispartof>SIAM journal on scientific computing, 2016-01, Vol.38 (6), p.C624-C651</ispartof><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c292t-5577dbab1091d35e7f6b09d1848e605f0745e879adce1f22b74cfb1931e303853</citedby><cites>FETCH-LOGICAL-c292t-5577dbab1091d35e7f6b09d1848e605f0745e879adce1f22b74cfb1931e303853</cites><orcidid>0000-0002-5880-1076</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,3184,27924,27925</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-01426294$$DView record in HAL$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/servlets/purl/1378775$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Azad, Ariful</creatorcontrib><creatorcontrib>Ballard, Grey</creatorcontrib><creatorcontrib>Buluç, Aydin</creatorcontrib><creatorcontrib>Demmel, James</creatorcontrib><creatorcontrib>Grigori, Laura</creatorcontrib><creatorcontrib>Schwartz, Oded</creatorcontrib><creatorcontrib>Toledo, Sivan</creatorcontrib><creatorcontrib>Williams, Samuel</creatorcontrib><creatorcontrib>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><title>Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication</title><title>SIAM journal on scientific computing</title><description>Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.Read More: epubs.siam.org/doi/10.1137/15M104253X</description><subject>2.5D algorithms</subject><subject>2D decomposition</subject><subject>3D algorithms</subject><subject>Computer Science</subject><subject>graph algorithms</subject><subject>MATHEMATICS AND COMPUTING</subject><subject>multithreading</subject><subject>numerical linear algebra</subject><subject>parallel computing</subject><subject>sparse matrix-matrix multiplication</subject><subject>SpGEMM</subject><issn>1064-8275</issn><issn>1095-7197</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNpFkF1LwzAUhoMoOKc3_oLinUI1Jx9NcznGdEKHAxW8C2maukjWliaO-e9dqR9X7-HwvIfDg9Al4FsAKu6ArwAzwunbEZoAljwVIMXxMGcszYngp-gshA-MIWOSTNB6se9866Jr3pPVp4-u8zYp7M76kLR1sta99t56F7aJa5LnTvfBJisde7dPx_itOaOja5tzdFJrH-zFT07R6_3iZb5Mi6eHx_msSA2RJKacC1GVujz8CBXlVtRZiWUFOctthnmNBeM2F1JXxkJNSCmYqUuQFCzFNOd0iq7Gu22ITgXjojUb0zaNNVEdVORCDND1CG20V13vtrr_Uq12ajkr1LDDwEhGJNvBgb0ZWdO3IfS2_isAVoNc9S-XfgPz4Gr2</recordid><startdate>20160101</startdate><enddate>20160101</enddate><creator>Azad, Ariful</creator><creator>Ballard, Grey</creator><creator>Buluç, Aydin</creator><creator>Demmel, James</creator><creator>Grigori, Laura</creator><creator>Schwartz, Oded</creator><creator>Toledo, Sivan</creator><creator>Williams, Samuel</creator><general>Society for Industrial and Applied Mathematics</general><general>SIAM</general><scope>AAYXX</scope><scope>CITATION</scope><scope>1XC</scope><scope>OIOZB</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000-0002-5880-1076</orcidid></search><sort><creationdate>20160101</creationdate><title>Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication</title><author>Azad, Ariful ; Ballard, Grey ; Buluç, Aydin ; Demmel, James ; Grigori, Laura ; Schwartz, Oded ; Toledo, Sivan ; Williams, Samuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c292t-5577dbab1091d35e7f6b09d1848e605f0745e879adce1f22b74cfb1931e303853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>2.5D algorithms</topic><topic>2D decomposition</topic><topic>3D algorithms</topic><topic>Computer Science</topic><topic>graph algorithms</topic><topic>MATHEMATICS AND COMPUTING</topic><topic>multithreading</topic><topic>numerical linear algebra</topic><topic>parallel computing</topic><topic>sparse matrix-matrix multiplication</topic><topic>SpGEMM</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Azad, Ariful</creatorcontrib><creatorcontrib>Ballard, Grey</creatorcontrib><creatorcontrib>Buluç, Aydin</creatorcontrib><creatorcontrib>Demmel, James</creatorcontrib><creatorcontrib>Grigori, Laura</creatorcontrib><creatorcontrib>Schwartz, Oded</creatorcontrib><creatorcontrib>Toledo, Sivan</creatorcontrib><creatorcontrib>Williams, Samuel</creatorcontrib><creatorcontrib>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><collection>CrossRef</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>SIAM journal on scientific computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Azad, Ariful</au><au>Ballard, Grey</au><au>Buluç, Aydin</au><au>Demmel, James</au><au>Grigori, Laura</au><au>Schwartz, Oded</au><au>Toledo, Sivan</au><au>Williams, Samuel</au><aucorp>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication</atitle><jtitle>SIAM journal on scientific computing</jtitle><date>2016-01-01</date><risdate>2016</risdate><volume>38</volume><issue>6</issue><spage>C624</spage><epage>C651</epage><pages>C624-C651</pages><issn>1064-8275</issn><eissn>1095-7197</eissn><abstract>Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.Read More: epubs.siam.org/doi/10.1137/15M104253X</abstract><cop>United States</cop><pub>Society for Industrial and Applied Mathematics</pub><doi>10.1137/15M104253X</doi><orcidid>https://orcid.org/0000-0002-5880-1076</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1064-8275
ispartof	SIAM journal on scientific computing, 2016-01, Vol.38 (6), p.C624-C651
issn	1064-8275 1095-7197
language	eng
recordid	cdi_osti_scitechconnect_1378775
source	SIAM Journals Online
subjects	2.5D algorithms 2D decomposition 3D algorithms Computer Science graph algorithms MATHEMATICS AND COMPUTING multithreading numerical linear algebra parallel computing sparse matrix-matrix multiplication SpGEMM
title	Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T19%3A13%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploiting%20Multiple%20Levels%20of%20Parallelism%20in%20Sparse%20Matrix-Matrix%20Multiplication&rft.jtitle=SIAM%20journal%20on%20scientific%20computing&rft.au=Azad,%20Ariful&rft.aucorp=Lawrence%20Berkeley%20National%20Lab.%20(LBNL),%20Berkeley,%20CA%20(United%20States)&rft.date=2016-01-01&rft.volume=38&rft.issue=6&rft.spage=C624&rft.epage=C651&rft.pages=C624-C651&rft.issn=1064-8275&rft.eissn=1095-7197&rft_id=info:doi/10.1137/15M104253X&rft_dat=%3Chal_osti_%3Eoai_HAL_hal_01426294v1%3C/hal_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true