Workload Balancing via Graph Reordering on Multicore Systems

In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2022-05, Vol.33 (5), p.1231-1245
Hauptverfasser:	Chen, YuAng, Chung, Yeh-Ching
Format:	Artikel
Sprache:	eng
Schlagworte:	Apexes Blogs cache locality Data structures graph processing Instruction sets Lightweight Load balancing Multicore processing Multicore system Optimization Parallel processing Performance evaluation Social networking (online) Sorting Statistical analysis Workload workload balance Workloads
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1245
container_issue	5
container_start_page	1231
container_title	IEEE transactions on parallel and distributed systems
container_volume	33
creator	Chen, YuAng Chung, Yeh-Ching
description	In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimization of cache locality. However, there is a compromise between cache locality and workload balance. Little insight has been devoted into the issue of workload imbalance for the underlying multicore system, which degrades the effectiveness of parallel graph processing. In this work, a measurement approach is proposed to quantify the imbalance incurred by the concentration of vertices. Inspired by it, we present Cache-aware Reorder (Corder) , a lightweight reordering method exploiting the cache hierarchy of multicore systems. At the shared-memory level, Corder promotes even distribution of computation loads amongst multicores. At the private-cache level, Corder facilitates cache efficiency by applying further refinement to local vertex order. Comprehensive performance evaluation of Corder is conducted on various graph applications and datasets. Experimental results show that Corder yields speedup of up to 2.59\times 2.59× and on average 1.45\times 1.45× , which significantly outperforms existing lightweight reordering methods. To identify the root causes of performance boost delivered by Corder, multicore activities are investigated in terms of thread behavior, cache efficiency, and memory utilization. Statistical analysis demonstrates that the issue of imbalanced thread execution time dominates other factors in determining the overall graph processing time. Moreover, Corder achieves remarkable advantages in cross-platform scalability and reordering overhead.
doi_str_mv	10.1109/TPDS.2021.3105323
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPDS_2021_3105323</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9516878</ieee_id><sourcerecordid>2583636685</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-c25f21f7906c85bc7252bcec750f972ac85f34a97616d059c1dcf001be19bd03</originalsourceid><addsrcrecordid>eNo9kE9LAzEQxYMoWKsfQLwseN6aSTbZBLz4twoVxRY8hmw20a3bTU22Qr-9WVq8zAyP92aGH0LngCcAWF4t3u7nE4IJTChgRgk9QCNgTOQEBD1MMy5YLgnIY3QS4xJjKBguRuj6w4fv1us6u9Wt7kzTfWa_jc6mQa-_snfrQ23DIPoue9m0fWN8sNl8G3u7iqfoyOk22rN9H6PF48Pi7imfvU6f725muSEF61NljoArJeZGsMqUhJHKWFMy7GRJdBIdLbQsOfAaM2mgNi59WFmQVY3pGF3u1q6D_9nY2Kul34QuXVSECcop54IlF-xcJvgYg3VqHZqVDlsFWA2M1MBIDYzUnlHKXOwyjbX23y8ZcFEK-gcbtmGz</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2583636685</pqid></control><display><type>article</type><title>Workload Balancing via Graph Reordering on Multicore Systems</title><source>IEEE Electronic Library Online</source><creator>Chen, YuAng ; Chung, Yeh-Ching</creator><creatorcontrib>Chen, YuAng ; Chung, Yeh-Ching</creatorcontrib><description><![CDATA[In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimization of cache locality. However, there is a compromise between cache locality and workload balance. Little insight has been devoted into the issue of workload imbalance for the underlying multicore system, which degrades the effectiveness of parallel graph processing. In this work, a measurement approach is proposed to quantify the imbalance incurred by the concentration of vertices. Inspired by it, we present Cache-aware Reorder (Corder) , a lightweight reordering method exploiting the cache hierarchy of multicore systems. At the shared-memory level, Corder promotes even distribution of computation loads amongst multicores. At the private-cache level, Corder facilitates cache efficiency by applying further refinement to local vertex order. Comprehensive performance evaluation of Corder is conducted on various graph applications and datasets. Experimental results show that Corder yields speedup of up to <inline-formula><tex-math notation="LaTeX">2.59\times</tex-math> <mml:math><mml:mrow><mml:mn>2</mml:mn><mml:mo>.</mml:mo><mml:mn>59</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq1-3105323.gif"/> </inline-formula> and on average <inline-formula><tex-math notation="LaTeX">1.45\times</tex-math> <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>45</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq2-3105323.gif"/> </inline-formula>, which significantly outperforms existing lightweight reordering methods. To identify the root causes of performance boost delivered by Corder, multicore activities are investigated in terms of thread behavior, cache efficiency, and memory utilization. Statistical analysis demonstrates that the issue of imbalanced thread execution time dominates other factors in determining the overall graph processing time. Moreover, Corder achieves remarkable advantages in cross-platform scalability and reordering overhead.]]></description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2021.3105323</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Apexes ; Blogs ; cache locality ; Data structures ; graph processing ; Instruction sets ; Lightweight ; Load balancing ; Multicore processing ; Multicore system ; Optimization ; Parallel processing ; Performance evaluation ; Social networking (online) ; Sorting ; Statistical analysis ; Workload ; workload balance ; Workloads</subject><ispartof>IEEE transactions on parallel and distributed systems, 2022-05, Vol.33 (5), p.1231-1245</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-c25f21f7906c85bc7252bcec750f972ac85f34a97616d059c1dcf001be19bd03</cites><orcidid>0000-0002-3392-8388 ; 0000-0002-8704-9821</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9516878$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9516878$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chen, YuAng</creatorcontrib><creatorcontrib>Chung, Yeh-Ching</creatorcontrib><title>Workload Balancing via Graph Reordering on Multicore Systems</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description><![CDATA[In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimization of cache locality. However, there is a compromise between cache locality and workload balance. Little insight has been devoted into the issue of workload imbalance for the underlying multicore system, which degrades the effectiveness of parallel graph processing. In this work, a measurement approach is proposed to quantify the imbalance incurred by the concentration of vertices. Inspired by it, we present Cache-aware Reorder (Corder) , a lightweight reordering method exploiting the cache hierarchy of multicore systems. At the shared-memory level, Corder promotes even distribution of computation loads amongst multicores. At the private-cache level, Corder facilitates cache efficiency by applying further refinement to local vertex order. Comprehensive performance evaluation of Corder is conducted on various graph applications and datasets. Experimental results show that Corder yields speedup of up to <inline-formula><tex-math notation="LaTeX">2.59\times</tex-math> <mml:math><mml:mrow><mml:mn>2</mml:mn><mml:mo>.</mml:mo><mml:mn>59</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq1-3105323.gif"/> </inline-formula> and on average <inline-formula><tex-math notation="LaTeX">1.45\times</tex-math> <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>45</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq2-3105323.gif"/> </inline-formula>, which significantly outperforms existing lightweight reordering methods. To identify the root causes of performance boost delivered by Corder, multicore activities are investigated in terms of thread behavior, cache efficiency, and memory utilization. Statistical analysis demonstrates that the issue of imbalanced thread execution time dominates other factors in determining the overall graph processing time. Moreover, Corder achieves remarkable advantages in cross-platform scalability and reordering overhead.]]></description><subject>Apexes</subject><subject>Blogs</subject><subject>cache locality</subject><subject>Data structures</subject><subject>graph processing</subject><subject>Instruction sets</subject><subject>Lightweight</subject><subject>Load balancing</subject><subject>Multicore processing</subject><subject>Multicore system</subject><subject>Optimization</subject><subject>Parallel processing</subject><subject>Performance evaluation</subject><subject>Social networking (online)</subject><subject>Sorting</subject><subject>Statistical analysis</subject><subject>Workload</subject><subject>workload balance</subject><subject>Workloads</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE9LAzEQxYMoWKsfQLwseN6aSTbZBLz4twoVxRY8hmw20a3bTU22Qr-9WVq8zAyP92aGH0LngCcAWF4t3u7nE4IJTChgRgk9QCNgTOQEBD1MMy5YLgnIY3QS4xJjKBguRuj6w4fv1us6u9Wt7kzTfWa_jc6mQa-_snfrQ23DIPoue9m0fWN8sNl8G3u7iqfoyOk22rN9H6PF48Pi7imfvU6f725muSEF61NljoArJeZGsMqUhJHKWFMy7GRJdBIdLbQsOfAaM2mgNi59WFmQVY3pGF3u1q6D_9nY2Kul34QuXVSECcop54IlF-xcJvgYg3VqHZqVDlsFWA2M1MBIDYzUnlHKXOwyjbX23y8ZcFEK-gcbtmGz</recordid><startdate>20220501</startdate><enddate>20220501</enddate><creator>Chen, YuAng</creator><creator>Chung, Yeh-Ching</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-3392-8388</orcidid><orcidid>https://orcid.org/0000-0002-8704-9821</orcidid></search><sort><creationdate>20220501</creationdate><title>Workload Balancing via Graph Reordering on Multicore Systems</title><author>Chen, YuAng ; Chung, Yeh-Ching</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-c25f21f7906c85bc7252bcec750f972ac85f34a97616d059c1dcf001be19bd03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Apexes</topic><topic>Blogs</topic><topic>cache locality</topic><topic>Data structures</topic><topic>graph processing</topic><topic>Instruction sets</topic><topic>Lightweight</topic><topic>Load balancing</topic><topic>Multicore processing</topic><topic>Multicore system</topic><topic>Optimization</topic><topic>Parallel processing</topic><topic>Performance evaluation</topic><topic>Social networking (online)</topic><topic>Sorting</topic><topic>Statistical analysis</topic><topic>Workload</topic><topic>workload balance</topic><topic>Workloads</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, YuAng</creatorcontrib><creatorcontrib>Chung, Yeh-Ching</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, YuAng</au><au>Chung, Yeh-Ching</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Workload Balancing via Graph Reordering on Multicore Systems</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2022-05-01</date><risdate>2022</risdate><volume>33</volume><issue>5</issue><spage>1231</spage><epage>1245</epage><pages>1231-1245</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract><![CDATA[In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimization of cache locality. However, there is a compromise between cache locality and workload balance. Little insight has been devoted into the issue of workload imbalance for the underlying multicore system, which degrades the effectiveness of parallel graph processing. In this work, a measurement approach is proposed to quantify the imbalance incurred by the concentration of vertices. Inspired by it, we present Cache-aware Reorder (Corder) , a lightweight reordering method exploiting the cache hierarchy of multicore systems. At the shared-memory level, Corder promotes even distribution of computation loads amongst multicores. At the private-cache level, Corder facilitates cache efficiency by applying further refinement to local vertex order. Comprehensive performance evaluation of Corder is conducted on various graph applications and datasets. Experimental results show that Corder yields speedup of up to <inline-formula><tex-math notation="LaTeX">2.59\times</tex-math> <mml:math><mml:mrow><mml:mn>2</mml:mn><mml:mo>.</mml:mo><mml:mn>59</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq1-3105323.gif"/> </inline-formula> and on average <inline-formula><tex-math notation="LaTeX">1.45\times</tex-math> <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>45</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq2-3105323.gif"/> </inline-formula>, which significantly outperforms existing lightweight reordering methods. To identify the root causes of performance boost delivered by Corder, multicore activities are investigated in terms of thread behavior, cache efficiency, and memory utilization. Statistical analysis demonstrates that the issue of imbalanced thread execution time dominates other factors in determining the overall graph processing time. Moreover, Corder achieves remarkable advantages in cross-platform scalability and reordering overhead.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2021.3105323</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-3392-8388</orcidid><orcidid>https://orcid.org/0000-0002-8704-9821</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1045-9219
ispartof	IEEE transactions on parallel and distributed systems, 2022-05, Vol.33 (5), p.1231-1245
issn	1045-9219 1558-2183
language	eng
recordid	cdi_crossref_primary_10_1109_TPDS_2021_3105323
source	IEEE Electronic Library Online
subjects	Apexes Blogs cache locality Data structures graph processing Instruction sets Lightweight Load balancing Multicore processing Multicore system Optimization Parallel processing Performance evaluation Social networking (online) Sorting Statistical analysis Workload workload balance Workloads
title	Workload Balancing via Graph Reordering on Multicore Systems
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T06%3A54%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Workload%20Balancing%20via%20Graph%20Reordering%20on%20Multicore%20Systems&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Chen,%20YuAng&rft.date=2022-05-01&rft.volume=33&rft.issue=5&rft.spage=1231&rft.epage=1245&rft.pages=1231-1245&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2021.3105323&rft_dat=%3Cproquest_RIE%3E2583636685%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2583636685&rft_id=info:pmid/&rft_ieee_id=9516878&rfr_iscdi=true