Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer
General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2019-04, Vol.30 (4), p.923-938 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 938 |
---|---|
container_issue | 4 |
container_start_page | 923 |
container_title | IEEE transactions on parallel and distributed systems |
container_volume | 30 |
creator | Chen, Yuedan Li, Kenli Yang, Wangdong Xiao, Guoqing Xie, Xianghui Li, Tao |
description | General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and CSC formats on the Sunway TaihuLight supercomputer. First, a multi-level parallelism design for SpGEMM is proposed to exploit the parallelism of over 10 millions cores and better control memory based on the special Sunway architecture. Optimization strategies, such as load balance, coalesced DMA transmission, data reuse, vectorized computation, and parallel pipeline processing, are applied to further optimize performance of SpGEMM kernels. Second, we thoroughly analyze the performance of the proposed kernels. Third, a performance-aware model for SpGEMM is proposed to select the most appropriate compressed storage formats for the sparse matrices that can achieve the optimal performance of SpGEMM on the Sunway. The experimental results show the SpGEMM kernels have good scalability and meet the challenge of the high-speed computing of large-scale data sets on the Sunway. In addition, the performance-aware model for SpGEMM achieves an absolute value of relative error rate of 8.31 percent on average when the kernels are executed in one single process and achieves 8.59 percent on average when the kernels are executed in multiple processes. It is proved that the proposed performance-aware model can perform at high accuracy and satisfies the precision of selecting the best formats for SpGEMM on the Sunway TaihuLight supercomputer. |
doi_str_mv | 10.1109/TPDS.2018.2871189 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2191259292</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8468040</ieee_id><sourcerecordid>2191259292</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-4289f4521421b2e8fb80cbd5d4b4745114abc46b8c4553dfd5d287175c7c07133</originalsourceid><addsrcrecordid>eNo9UMtqwzAQFKWFpmk_oPRi6NmpVpZi6RjSJzg0kPQsZFluFBzblWXS_H1lHAoLs8zO7jKD0D3gGQAWT9v182ZGMPAZ4SkAFxdoAozxmABPLkOPKYsFAXGNbrpujzFQhukElWvjysYdVK1NvDgqZ6JVU5gqCmS0aZXrAqG8s7_xCNGqr7xtK6uVt00dhfI7E236-qhO0VbZXZ_Z750PTGucbg5t7427RVelqjpzd8Yp-np92S7f4-zz7WO5yGKdMOFjSrgoKSNACeTE8DLnWOcFK2hOU8oAqMo1nedcU8aSogyTwW7KdKpxCkkyRY_j3dY1P73pvNw3vavDSxm8A2GCCBJUMKq0a7rOmVK2zh6UO0nAcohTDnHKIU55jjPsPIw71hjzr-d0zjHFyR9063Gi</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2191259292</pqid></control><display><type>article</type><title>Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer</title><source>IEEE Electronic Library Online</source><creator>Chen, Yuedan ; Li, Kenli ; Yang, Wangdong ; Xiao, Guoqing ; Xie, Xianghui ; Li, Tao</creator><creatorcontrib>Chen, Yuedan ; Li, Kenli ; Yang, Wangdong ; Xiao, Guoqing ; Xie, Xianghui ; Li, Tao</creatorcontrib><description>General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and CSC formats on the Sunway TaihuLight supercomputer. First, a multi-level parallelism design for SpGEMM is proposed to exploit the parallelism of over 10 millions cores and better control memory based on the special Sunway architecture. Optimization strategies, such as load balance, coalesced DMA transmission, data reuse, vectorized computation, and parallel pipeline processing, are applied to further optimize performance of SpGEMM kernels. Second, we thoroughly analyze the performance of the proposed kernels. Third, a performance-aware model for SpGEMM is proposed to select the most appropriate compressed storage formats for the sparse matrices that can achieve the optimal performance of SpGEMM on the Sunway. The experimental results show the SpGEMM kernels have good scalability and meet the challenge of the high-speed computing of large-scale data sets on the Sunway. In addition, the performance-aware model for SpGEMM achieves an absolute value of relative error rate of 8.31 percent on average when the kernels are executed in one single process and achieves 8.59 percent on average when the kernels are executed in multiple processes. It is proved that the proposed performance-aware model can perform at high accuracy and satisfies the precision of selecting the best formats for SpGEMM on the Sunway TaihuLight supercomputer.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2018.2871189</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Analytical models ; Computational modeling ; Computer architecture ; Heterogeneous many-core processor ; Kernel ; Kernels ; Mathematical analysis ; Matrix methods ; Model accuracy ; Multiplication ; Optimization ; Parallel processing ; parallelism ; performance analysis ; performance-aware ; Sparse matrices ; Sparsity ; SpGEMM ; Sunway TaihuLight supercomputer ; Supercomputers</subject><ispartof>IEEE transactions on parallel and distributed systems, 2019-04, Vol.30 (4), p.923-938</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c359t-4289f4521421b2e8fb80cbd5d4b4745114abc46b8c4553dfd5d287175c7c07133</citedby><cites>FETCH-LOGICAL-c359t-4289f4521421b2e8fb80cbd5d4b4745114abc46b8c4553dfd5d287175c7c07133</cites><orcidid>0000-0001-5008-4829 ; 0000-0001-5665-268X ; 0000-0002-2635-7716</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8468040$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,782,786,798,27933,27934,54767</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8468040$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chen, Yuedan</creatorcontrib><creatorcontrib>Li, Kenli</creatorcontrib><creatorcontrib>Yang, Wangdong</creatorcontrib><creatorcontrib>Xiao, Guoqing</creatorcontrib><creatorcontrib>Xie, Xianghui</creatorcontrib><creatorcontrib>Li, Tao</creatorcontrib><title>Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and CSC formats on the Sunway TaihuLight supercomputer. First, a multi-level parallelism design for SpGEMM is proposed to exploit the parallelism of over 10 millions cores and better control memory based on the special Sunway architecture. Optimization strategies, such as load balance, coalesced DMA transmission, data reuse, vectorized computation, and parallel pipeline processing, are applied to further optimize performance of SpGEMM kernels. Second, we thoroughly analyze the performance of the proposed kernels. Third, a performance-aware model for SpGEMM is proposed to select the most appropriate compressed storage formats for the sparse matrices that can achieve the optimal performance of SpGEMM on the Sunway. The experimental results show the SpGEMM kernels have good scalability and meet the challenge of the high-speed computing of large-scale data sets on the Sunway. In addition, the performance-aware model for SpGEMM achieves an absolute value of relative error rate of 8.31 percent on average when the kernels are executed in one single process and achieves 8.59 percent on average when the kernels are executed in multiple processes. It is proved that the proposed performance-aware model can perform at high accuracy and satisfies the precision of selecting the best formats for SpGEMM on the Sunway TaihuLight supercomputer.</description><subject>Analytical models</subject><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Heterogeneous many-core processor</subject><subject>Kernel</subject><subject>Kernels</subject><subject>Mathematical analysis</subject><subject>Matrix methods</subject><subject>Model accuracy</subject><subject>Multiplication</subject><subject>Optimization</subject><subject>Parallel processing</subject><subject>parallelism</subject><subject>performance analysis</subject><subject>performance-aware</subject><subject>Sparse matrices</subject><subject>Sparsity</subject><subject>SpGEMM</subject><subject>Sunway TaihuLight supercomputer</subject><subject>Supercomputers</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9UMtqwzAQFKWFpmk_oPRi6NmpVpZi6RjSJzg0kPQsZFluFBzblWXS_H1lHAoLs8zO7jKD0D3gGQAWT9v182ZGMPAZ4SkAFxdoAozxmABPLkOPKYsFAXGNbrpujzFQhukElWvjysYdVK1NvDgqZ6JVU5gqCmS0aZXrAqG8s7_xCNGqr7xtK6uVt00dhfI7E236-qhO0VbZXZ_Z750PTGucbg5t7427RVelqjpzd8Yp-np92S7f4-zz7WO5yGKdMOFjSrgoKSNACeTE8DLnWOcFK2hOU8oAqMo1nedcU8aSogyTwW7KdKpxCkkyRY_j3dY1P73pvNw3vavDSxm8A2GCCBJUMKq0a7rOmVK2zh6UO0nAcohTDnHKIU55jjPsPIw71hjzr-d0zjHFyR9063Gi</recordid><startdate>20190401</startdate><enddate>20190401</enddate><creator>Chen, Yuedan</creator><creator>Li, Kenli</creator><creator>Yang, Wangdong</creator><creator>Xiao, Guoqing</creator><creator>Xie, Xianghui</creator><creator>Li, Tao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-5008-4829</orcidid><orcidid>https://orcid.org/0000-0001-5665-268X</orcidid><orcidid>https://orcid.org/0000-0002-2635-7716</orcidid></search><sort><creationdate>20190401</creationdate><title>Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer</title><author>Chen, Yuedan ; Li, Kenli ; Yang, Wangdong ; Xiao, Guoqing ; Xie, Xianghui ; Li, Tao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-4289f4521421b2e8fb80cbd5d4b4745114abc46b8c4553dfd5d287175c7c07133</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Analytical models</topic><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Heterogeneous many-core processor</topic><topic>Kernel</topic><topic>Kernels</topic><topic>Mathematical analysis</topic><topic>Matrix methods</topic><topic>Model accuracy</topic><topic>Multiplication</topic><topic>Optimization</topic><topic>Parallel processing</topic><topic>parallelism</topic><topic>performance analysis</topic><topic>performance-aware</topic><topic>Sparse matrices</topic><topic>Sparsity</topic><topic>SpGEMM</topic><topic>Sunway TaihuLight supercomputer</topic><topic>Supercomputers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Yuedan</creatorcontrib><creatorcontrib>Li, Kenli</creatorcontrib><creatorcontrib>Yang, Wangdong</creatorcontrib><creatorcontrib>Xiao, Guoqing</creatorcontrib><creatorcontrib>Xie, Xianghui</creatorcontrib><creatorcontrib>Li, Tao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Yuedan</au><au>Li, Kenli</au><au>Yang, Wangdong</au><au>Xiao, Guoqing</au><au>Xie, Xianghui</au><au>Li, Tao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2019-04-01</date><risdate>2019</risdate><volume>30</volume><issue>4</issue><spage>923</spage><epage>938</epage><pages>923-938</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and CSC formats on the Sunway TaihuLight supercomputer. First, a multi-level parallelism design for SpGEMM is proposed to exploit the parallelism of over 10 millions cores and better control memory based on the special Sunway architecture. Optimization strategies, such as load balance, coalesced DMA transmission, data reuse, vectorized computation, and parallel pipeline processing, are applied to further optimize performance of SpGEMM kernels. Second, we thoroughly analyze the performance of the proposed kernels. Third, a performance-aware model for SpGEMM is proposed to select the most appropriate compressed storage formats for the sparse matrices that can achieve the optimal performance of SpGEMM on the Sunway. The experimental results show the SpGEMM kernels have good scalability and meet the challenge of the high-speed computing of large-scale data sets on the Sunway. In addition, the performance-aware model for SpGEMM achieves an absolute value of relative error rate of 8.31 percent on average when the kernels are executed in one single process and achieves 8.59 percent on average when the kernels are executed in multiple processes. It is proved that the proposed performance-aware model can perform at high accuracy and satisfies the precision of selecting the best formats for SpGEMM on the Sunway TaihuLight supercomputer.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2018.2871189</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0001-5008-4829</orcidid><orcidid>https://orcid.org/0000-0001-5665-268X</orcidid><orcidid>https://orcid.org/0000-0002-2635-7716</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1045-9219 |
ispartof | IEEE transactions on parallel and distributed systems, 2019-04, Vol.30 (4), p.923-938 |
issn | 1045-9219 1558-2183 |
language | eng |
recordid | cdi_proquest_journals_2191259292 |
source | IEEE Electronic Library Online |
subjects | Analytical models Computational modeling Computer architecture Heterogeneous many-core processor Kernel Kernels Mathematical analysis Matrix methods Model accuracy Multiplication Optimization Parallel processing parallelism performance analysis performance-aware Sparse matrices Sparsity SpGEMM Sunway TaihuLight supercomputer Supercomputers |
title | Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-03T16%3A26%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Performance-Aware%20Model%20for%20Sparse%20Matrix-Matrix%20Multiplication%20on%20the%20Sunway%20TaihuLight%20Supercomputer&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Chen,%20Yuedan&rft.date=2019-04-01&rft.volume=30&rft.issue=4&rft.spage=923&rft.epage=938&rft.pages=923-938&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2018.2871189&rft_dat=%3Cproquest_RIE%3E2191259292%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2191259292&rft_id=info:pmid/&rft_ieee_id=8468040&rfr_iscdi=true |