Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer

General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2019-04, Vol.30 (4), p.923-938
Hauptverfasser:	Chen, Yuedan, Li, Kenli, Yang, Wangdong, Xiao, Guoqing, Xie, Xianghui, Li, Tao
Format:	Artikel
Sprache:	eng
Schlagworte:	Analytical models Computational modeling Computer architecture Heterogeneous many-core processor Kernel Kernels Mathematical analysis Matrix methods Model accuracy Multiplication Optimization Parallel processing parallelism performance analysis performance-aware Sparse matrices Sparsity SpGEMM Sunway TaihuLight supercomputer Supercomputers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	938
container_issue	4
container_start_page	923
container_title	IEEE transactions on parallel and distributed systems
container_volume	30
creator	Chen, Yuedan Li, Kenli Yang, Wangdong Xiao, Guoqing Xie, Xianghui Li, Tao
description	General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and CSC formats on the Sunway TaihuLight supercomputer. First, a multi-level parallelism design for SpGEMM is proposed to exploit the parallelism of over 10 millions cores and better control memory based on the special Sunway architecture. Optimization strategies, such as load balance, coalesced DMA transmission, data reuse, vectorized computation, and parallel pipeline processing, are applied to further optimize performance of SpGEMM kernels. Second, we thoroughly analyze the performance of the proposed kernels. Third, a performance-aware model for SpGEMM is proposed to select the most appropriate compressed storage formats for the sparse matrices that can achieve the optimal performance of SpGEMM on the Sunway. The experimental results show the SpGEMM kernels have good scalability and meet the challenge of the high-speed computing of large-scale data sets on the Sunway. In addition, the performance-aware model for SpGEMM achieves an absolute value of relative error rate of 8.31 percent on average when the kernels are executed in one single process and achieves 8.59 percent on average when the kernels are executed in multiple processes. It is proved that the proposed performance-aware model can perform at high accuracy and satisfies the precision of selecting the best formats for SpGEMM on the Sunway TaihuLight supercomputer.
doi_str_mv	10.1109/TPDS.2018.2871189
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2191259292</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8468040</ieee_id><sourcerecordid>2191259292</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-4289f4521421b2e8fb80cbd5d4b4745114abc46b8c4553dfd5d287175c7c07133</originalsourceid><addsrcrecordid>eNo9UMtqwzAQFKWFpmk_oPRi6NmpVpZi6RjSJzg0kPQsZFluFBzblWXS_H1lHAoLs8zO7jKD0D3gGQAWT9v182ZGMPAZ4SkAFxdoAozxmABPLkOPKYsFAXGNbrpujzFQhukElWvjysYdVK1NvDgqZ6JVU5gqCmS0aZXrAqG8s7_xCNGqr7xtK6uVt00dhfI7E236-qhO0VbZXZ_Z750PTGucbg5t7427RVelqjpzd8Yp-np92S7f4-zz7WO5yGKdMOFjSrgoKSNACeTE8DLnWOcFK2hOU8oAqMo1nedcU8aSogyTwW7KdKpxCkkyRY_j3dY1P73pvNw3vavDSxm8A2GCCBJUMKq0a7rOmVK2zh6UO0nAcohTDnHKIU55jjPsPIw71hjzr-d0zjHFyR9063Gi</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2191259292</pqid></control><display><type>article</type><title>Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer</title><source>IEEE Electronic Library Online</source><creator>Chen, Yuedan ; Li, Kenli ; Yang, Wangdong ; Xiao, Guoqing ; Xie, Xianghui ; Li, Tao</creator><creatorcontrib>Chen, Yuedan ; Li, Kenli ; Yang, Wangdong ; Xiao, Guoqing ; Xie, Xianghui ; Li, Tao</creatorcontrib><description>General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and CSC formats on the Sunway TaihuLight supercomputer. First, a multi-level parallelism design for SpGEMM is proposed to exploit the parallelism of over 10 millions cores and better control memory based on the special Sunway architecture. Optimization strategies, such as load balance, coalesced DMA transmission, data reuse, vectorized computation, and parallel pipeline processing, are applied to further optimize performance of SpGEMM kernels. Second, we thoroughly analyze the performance of the proposed kernels. Third, a performance-aware model for SpGEMM is proposed to select the most appropriate compressed storage formats for the sparse matrices that can achieve the optimal performance of SpGEMM on the Sunway. The experimental results show the SpGEMM kernels have good scalability and meet the challenge of the high-speed computing of large-scale data sets on the Sunway. In addition, the performance-aware model for SpGEMM achieves an absolute value of relative error rate of 8.31 percent on average when the kernels are executed in one single process and achieves 8.59 percent on average when the kernels are executed in multiple processes. It is proved that the proposed performance-aware model can perform at high accuracy and satisfies the precision of selecting the best formats for SpGEMM on the Sunway TaihuLight supercomputer.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2018.2871189</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Analytical models ; Computational modeling ; Computer architecture ; Heterogeneous many-core processor ; Kernel ; Kernels ; Mathematical analysis ; Matrix methods ; Model accuracy ; Multiplication ; Optimization ; Parallel processing ; parallelism ; performance analysis ; performance-aware ; Sparse matrices ; Sparsity ; SpGEMM ; Sunway TaihuLight supercomputer ; Supercomputers</subject><ispartof>IEEE transactions on parallel and distributed systems, 2019-04, Vol.30 (4), p.923-938</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c359t-4289f4521421b2e8fb80cbd5d4b4745114abc46b8c4553dfd5d287175c7c07133</citedby><cites>FETCH-LOGICAL-c359t-4289f4521421b2e8fb80cbd5d4b4745114abc46b8c4553dfd5d287175c7c07133</cites><orcidid>0000-0001-5008-4829 ; 0000-0001-5665-268X ; 0000-0002-2635-7716</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8468040$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,782,786,798,27933,27934,54767</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8468040$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chen, Yuedan</creatorcontrib><creatorcontrib>Li, Kenli</creatorcontrib><creatorcontrib>Yang, Wangdong</creatorcontrib><creatorcontrib>Xiao, Guoqing</creatorcontrib><creatorcontrib>Xie, Xianghui</creatorcontrib><creatorcontrib>Li, Tao</creatorcontrib><title>Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and CSC formats on the Sunway TaihuLight supercomputer. First, a multi-level parallelism design for SpGEMM is proposed to exploit the parallelism of over 10 millions cores and better control memory based on the special Sunway architecture. Optimization strategies, such as load balance, coalesced DMA transmission, data reuse, vectorized computation, and parallel pipeline processing, are applied to further optimize performance of SpGEMM kernels. Second, we thoroughly analyze the performance of the proposed kernels. Third, a performance-aware model for SpGEMM is proposed to select the most appropriate compressed storage formats for the sparse matrices that can achieve the optimal performance of SpGEMM on the Sunway. The experimental results show the SpGEMM kernels have good scalability and meet the challenge of the high-speed computing of large-scale data sets on the Sunway. In addition, the performance-aware model for SpGEMM achieves an absolute value of relative error rate of 8.31 percent on average when the kernels are executed in one single process and achieves 8.59 percent on average when the kernels are executed in multiple processes. It is proved that the proposed performance-aware model can perform at high accuracy and satisfies the precision of selecting the best formats for SpGEMM on the Sunway TaihuLight supercomputer.</description><subject>Analytical models</subject><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Heterogeneous many-core processor</subject><subject>Kernel</subject><subject>Kernels</subject><subject>Mathematical analysis</subject><subject>Matrix methods</subject><subject>Model accuracy</subject><subject>Multiplication</subject><subject>Optimization</subject><subject>Parallel processing</subject><subject>parallelism</subject><subject>performance analysis</subject><subject>performance-aware</subject><subject>Sparse matrices</subject><subject>Sparsity</subject><subject>SpGEMM</subject><subject>Sunway TaihuLight supercomputer</subject><subject>Supercomputers</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9UMtqwzAQFKWFpmk_oPRi6NmpVpZi6RjSJzg0kPQsZFluFBzblWXS_H1lHAoLs8zO7jKD0D3gGQAWT9v182ZGMPAZ4SkAFxdoAozxmABPLkOPKYsFAXGNbrpujzFQhukElWvjysYdVK1NvDgqZ6JVU5gqCmS0aZXrAqG8s7_xCNGqr7xtK6uVt00dhfI7E236-qhO0VbZXZ_Z750PTGucbg5t7427RVelqjpzd8Yp-np92S7f4-zz7WO5yGKdMOFjSrgoKSNACeTE8DLnWOcFK2hOU8oAqMo1nedcU8aSogyTwW7KdKpxCkkyRY_j3dY1P73pvNw3vavDSxm8A2GCCBJUMKq0a7rOmVK2zh6UO0nAcohTDnHKIU55jjPsPIw71hjzr-d0zjHFyR9063Gi</recordid><startdate>20190401</startdate><enddate>20190401</enddate><creator>Chen, Yuedan</creator><creator>Li, Kenli</creator><creator>Yang, Wangdong</creator><creator>Xiao, Guoqing</creator><creator>Xie, Xianghui</creator><creator>Li, Tao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-5008-4829</orcidid><orcidid>https://orcid.org/0000-0001-5665-268X</orcidid><orcidid>https://orcid.org/0000-0002-2635-7716</orcidid></search><sort><creationdate>20190401</creationdate><title>Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer</title><author>Chen, Yuedan ; Li, Kenli ; Yang, Wangdong ; Xiao, Guoqing ; Xie, Xianghui ; Li, Tao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-4289f4521421b2e8fb80cbd5d4b4745114abc46b8c4553dfd5d287175c7c07133</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Analytical models</topic><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Heterogeneous many-core processor</topic><topic>Kernel</topic><topic>Kernels</topic><topic>Mathematical analysis</topic><topic>Matrix methods</topic><topic>Model accuracy</topic><topic>Multiplication</topic><topic>Optimization</topic><topic>Parallel processing</topic><topic>parallelism</topic><topic>performance analysis</topic><topic>performance-aware</topic><topic>Sparse matrices</topic><topic>Sparsity</topic><topic>SpGEMM</topic><topic>Sunway TaihuLight supercomputer</topic><topic>Supercomputers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Yuedan</creatorcontrib><creatorcontrib>Li, Kenli</creatorcontrib><creatorcontrib>Yang, Wangdong</creatorcontrib><creatorcontrib>Xiao, Guoqing</creatorcontrib><creatorcontrib>Xie, Xianghui</creatorcontrib><creatorcontrib>Li, Tao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Yuedan</au><au>Li, Kenli</au><au>Yang, Wangdong</au><au>Xiao, Guoqing</au><au>Xie, Xianghui</au><au>Li, Tao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2019-04-01</date><risdate>2019</risdate><volume>30</volume><issue>4</issue><spage>923</spage><epage>938</epage><pages>923-938</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and CSC formats on the Sunway TaihuLight supercomputer. First, a multi-level parallelism design for SpGEMM is proposed to exploit the parallelism of over 10 millions cores and better control memory based on the special Sunway architecture. Optimization strategies, such as load balance, coalesced DMA transmission, data reuse, vectorized computation, and parallel pipeline processing, are applied to further optimize performance of SpGEMM kernels. Second, we thoroughly analyze the performance of the proposed kernels. Third, a performance-aware model for SpGEMM is proposed to select the most appropriate compressed storage formats for the sparse matrices that can achieve the optimal performance of SpGEMM on the Sunway. The experimental results show the SpGEMM kernels have good scalability and meet the challenge of the high-speed computing of large-scale data sets on the Sunway. In addition, the performance-aware model for SpGEMM achieves an absolute value of relative error rate of 8.31 percent on average when the kernels are executed in one single process and achieves 8.59 percent on average when the kernels are executed in multiple processes. It is proved that the proposed performance-aware model can perform at high accuracy and satisfies the precision of selecting the best formats for SpGEMM on the Sunway TaihuLight supercomputer.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2018.2871189</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0001-5008-4829</orcidid><orcidid>https://orcid.org/0000-0001-5665-268X</orcidid><orcidid>https://orcid.org/0000-0002-2635-7716</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1045-9219
ispartof	IEEE transactions on parallel and distributed systems, 2019-04, Vol.30 (4), p.923-938
issn	1045-9219 1558-2183
language	eng
recordid	cdi_proquest_journals_2191259292
source	IEEE Electronic Library Online
subjects	Analytical models Computational modeling Computer architecture Heterogeneous many-core processor Kernel Kernels Mathematical analysis Matrix methods Model accuracy Multiplication Optimization Parallel processing parallelism performance analysis performance-aware Sparse matrices Sparsity SpGEMM Sunway TaihuLight supercomputer Supercomputers
title	Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-03T16%3A26%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Performance-Aware%20Model%20for%20Sparse%20Matrix-Matrix%20Multiplication%20on%20the%20Sunway%20TaihuLight%20Supercomputer&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Chen,%20Yuedan&rft.date=2019-04-01&rft.volume=30&rft.issue=4&rft.spage=923&rft.epage=938&rft.pages=923-938&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2018.2871189&rft_dat=%3Cproquest_RIE%3E2191259292%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2191259292&rft_id=info:pmid/&rft_ieee_id=8468040&rfr_iscdi=true