Workload Balancing via Graph Reordering on Multicore Systems

In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2022-05, Vol.33 (5), p.1231-1245
Hauptverfasser: Chen, YuAng, Chung, Yeh-Ching
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1245
container_issue 5
container_start_page 1231
container_title IEEE transactions on parallel and distributed systems
container_volume 33
creator Chen, YuAng
Chung, Yeh-Ching
description In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimization of cache locality. However, there is a compromise between cache locality and workload balance. Little insight has been devoted into the issue of workload imbalance for the underlying multicore system, which degrades the effectiveness of parallel graph processing. In this work, a measurement approach is proposed to quantify the imbalance incurred by the concentration of vertices. Inspired by it, we present Cache-aware Reorder (Corder) , a lightweight reordering method exploiting the cache hierarchy of multicore systems. At the shared-memory level, Corder promotes even distribution of computation loads amongst multicores. At the private-cache level, Corder facilitates cache efficiency by applying further refinement to local vertex order. Comprehensive performance evaluation of Corder is conducted on various graph applications and datasets. Experimental results show that Corder yields speedup of up to 2.59\times 2.59× and on average 1.45\times 1.45× , which significantly outperforms existing lightweight reordering methods. To identify the root causes of performance boost delivered by Corder, multicore activities are investigated in terms of thread behavior, cache efficiency, and memory utilization. Statistical analysis demonstrates that the issue of imbalanced thread execution time dominates other factors in determining the overall graph processing time. Moreover, Corder achieves remarkable advantages in cross-platform scalability and reordering overhead.
doi_str_mv 10.1109/TPDS.2021.3105323
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPDS_2021_3105323</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9516878</ieee_id><sourcerecordid>2583636685</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-c25f21f7906c85bc7252bcec750f972ac85f34a97616d059c1dcf001be19bd03</originalsourceid><addsrcrecordid>eNo9kE9LAzEQxYMoWKsfQLwseN6aSTbZBLz4twoVxRY8hmw20a3bTU22Qr-9WVq8zAyP92aGH0LngCcAWF4t3u7nE4IJTChgRgk9QCNgTOQEBD1MMy5YLgnIY3QS4xJjKBguRuj6w4fv1us6u9Wt7kzTfWa_jc6mQa-_snfrQ23DIPoue9m0fWN8sNl8G3u7iqfoyOk22rN9H6PF48Pi7imfvU6f725muSEF61NljoArJeZGsMqUhJHKWFMy7GRJdBIdLbQsOfAaM2mgNi59WFmQVY3pGF3u1q6D_9nY2Kul34QuXVSECcop54IlF-xcJvgYg3VqHZqVDlsFWA2M1MBIDYzUnlHKXOwyjbX23y8ZcFEK-gcbtmGz</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2583636685</pqid></control><display><type>article</type><title>Workload Balancing via Graph Reordering on Multicore Systems</title><source>IEEE Electronic Library Online</source><creator>Chen, YuAng ; Chung, Yeh-Ching</creator><creatorcontrib>Chen, YuAng ; Chung, Yeh-Ching</creatorcontrib><description><![CDATA[In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimization of cache locality. However, there is a compromise between cache locality and workload balance. Little insight has been devoted into the issue of workload imbalance for the underlying multicore system, which degrades the effectiveness of parallel graph processing. In this work, a measurement approach is proposed to quantify the imbalance incurred by the concentration of vertices. Inspired by it, we present Cache-aware Reorder (Corder) , a lightweight reordering method exploiting the cache hierarchy of multicore systems. At the shared-memory level, Corder promotes even distribution of computation loads amongst multicores. At the private-cache level, Corder facilitates cache efficiency by applying further refinement to local vertex order. Comprehensive performance evaluation of Corder is conducted on various graph applications and datasets. Experimental results show that Corder yields speedup of up to <inline-formula><tex-math notation="LaTeX">2.59\times</tex-math> <mml:math><mml:mrow><mml:mn>2</mml:mn><mml:mo>.</mml:mo><mml:mn>59</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq1-3105323.gif"/> </inline-formula> and on average <inline-formula><tex-math notation="LaTeX">1.45\times</tex-math> <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>45</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq2-3105323.gif"/> </inline-formula>, which significantly outperforms existing lightweight reordering methods. To identify the root causes of performance boost delivered by Corder, multicore activities are investigated in terms of thread behavior, cache efficiency, and memory utilization. Statistical analysis demonstrates that the issue of imbalanced thread execution time dominates other factors in determining the overall graph processing time. Moreover, Corder achieves remarkable advantages in cross-platform scalability and reordering overhead.]]></description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2021.3105323</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Apexes ; Blogs ; cache locality ; Data structures ; graph processing ; Instruction sets ; Lightweight ; Load balancing ; Multicore processing ; Multicore system ; Optimization ; Parallel processing ; Performance evaluation ; Social networking (online) ; Sorting ; Statistical analysis ; Workload ; workload balance ; Workloads</subject><ispartof>IEEE transactions on parallel and distributed systems, 2022-05, Vol.33 (5), p.1231-1245</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-c25f21f7906c85bc7252bcec750f972ac85f34a97616d059c1dcf001be19bd03</cites><orcidid>0000-0002-3392-8388 ; 0000-0002-8704-9821</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9516878$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9516878$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chen, YuAng</creatorcontrib><creatorcontrib>Chung, Yeh-Ching</creatorcontrib><title>Workload Balancing via Graph Reordering on Multicore Systems</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description><![CDATA[In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimization of cache locality. However, there is a compromise between cache locality and workload balance. Little insight has been devoted into the issue of workload imbalance for the underlying multicore system, which degrades the effectiveness of parallel graph processing. In this work, a measurement approach is proposed to quantify the imbalance incurred by the concentration of vertices. Inspired by it, we present Cache-aware Reorder (Corder) , a lightweight reordering method exploiting the cache hierarchy of multicore systems. At the shared-memory level, Corder promotes even distribution of computation loads amongst multicores. At the private-cache level, Corder facilitates cache efficiency by applying further refinement to local vertex order. Comprehensive performance evaluation of Corder is conducted on various graph applications and datasets. Experimental results show that Corder yields speedup of up to <inline-formula><tex-math notation="LaTeX">2.59\times</tex-math> <mml:math><mml:mrow><mml:mn>2</mml:mn><mml:mo>.</mml:mo><mml:mn>59</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq1-3105323.gif"/> </inline-formula> and on average <inline-formula><tex-math notation="LaTeX">1.45\times</tex-math> <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>45</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq2-3105323.gif"/> </inline-formula>, which significantly outperforms existing lightweight reordering methods. To identify the root causes of performance boost delivered by Corder, multicore activities are investigated in terms of thread behavior, cache efficiency, and memory utilization. Statistical analysis demonstrates that the issue of imbalanced thread execution time dominates other factors in determining the overall graph processing time. Moreover, Corder achieves remarkable advantages in cross-platform scalability and reordering overhead.]]></description><subject>Apexes</subject><subject>Blogs</subject><subject>cache locality</subject><subject>Data structures</subject><subject>graph processing</subject><subject>Instruction sets</subject><subject>Lightweight</subject><subject>Load balancing</subject><subject>Multicore processing</subject><subject>Multicore system</subject><subject>Optimization</subject><subject>Parallel processing</subject><subject>Performance evaluation</subject><subject>Social networking (online)</subject><subject>Sorting</subject><subject>Statistical analysis</subject><subject>Workload</subject><subject>workload balance</subject><subject>Workloads</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE9LAzEQxYMoWKsfQLwseN6aSTbZBLz4twoVxRY8hmw20a3bTU22Qr-9WVq8zAyP92aGH0LngCcAWF4t3u7nE4IJTChgRgk9QCNgTOQEBD1MMy5YLgnIY3QS4xJjKBguRuj6w4fv1us6u9Wt7kzTfWa_jc6mQa-_snfrQ23DIPoue9m0fWN8sNl8G3u7iqfoyOk22rN9H6PF48Pi7imfvU6f725muSEF61NljoArJeZGsMqUhJHKWFMy7GRJdBIdLbQsOfAaM2mgNi59WFmQVY3pGF3u1q6D_9nY2Kul34QuXVSECcop54IlF-xcJvgYg3VqHZqVDlsFWA2M1MBIDYzUnlHKXOwyjbX23y8ZcFEK-gcbtmGz</recordid><startdate>20220501</startdate><enddate>20220501</enddate><creator>Chen, YuAng</creator><creator>Chung, Yeh-Ching</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-3392-8388</orcidid><orcidid>https://orcid.org/0000-0002-8704-9821</orcidid></search><sort><creationdate>20220501</creationdate><title>Workload Balancing via Graph Reordering on Multicore Systems</title><author>Chen, YuAng ; Chung, Yeh-Ching</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-c25f21f7906c85bc7252bcec750f972ac85f34a97616d059c1dcf001be19bd03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Apexes</topic><topic>Blogs</topic><topic>cache locality</topic><topic>Data structures</topic><topic>graph processing</topic><topic>Instruction sets</topic><topic>Lightweight</topic><topic>Load balancing</topic><topic>Multicore processing</topic><topic>Multicore system</topic><topic>Optimization</topic><topic>Parallel processing</topic><topic>Performance evaluation</topic><topic>Social networking (online)</topic><topic>Sorting</topic><topic>Statistical analysis</topic><topic>Workload</topic><topic>workload balance</topic><topic>Workloads</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, YuAng</creatorcontrib><creatorcontrib>Chung, Yeh-Ching</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, YuAng</au><au>Chung, Yeh-Ching</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Workload Balancing via Graph Reordering on Multicore Systems</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2022-05-01</date><risdate>2022</risdate><volume>33</volume><issue>5</issue><spage>1231</spage><epage>1245</epage><pages>1231-1245</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract><![CDATA[In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimization of cache locality. However, there is a compromise between cache locality and workload balance. Little insight has been devoted into the issue of workload imbalance for the underlying multicore system, which degrades the effectiveness of parallel graph processing. In this work, a measurement approach is proposed to quantify the imbalance incurred by the concentration of vertices. Inspired by it, we present Cache-aware Reorder (Corder) , a lightweight reordering method exploiting the cache hierarchy of multicore systems. At the shared-memory level, Corder promotes even distribution of computation loads amongst multicores. At the private-cache level, Corder facilitates cache efficiency by applying further refinement to local vertex order. Comprehensive performance evaluation of Corder is conducted on various graph applications and datasets. Experimental results show that Corder yields speedup of up to <inline-formula><tex-math notation="LaTeX">2.59\times</tex-math> <mml:math><mml:mrow><mml:mn>2</mml:mn><mml:mo>.</mml:mo><mml:mn>59</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq1-3105323.gif"/> </inline-formula> and on average <inline-formula><tex-math notation="LaTeX">1.45\times</tex-math> <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>45</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq2-3105323.gif"/> </inline-formula>, which significantly outperforms existing lightweight reordering methods. To identify the root causes of performance boost delivered by Corder, multicore activities are investigated in terms of thread behavior, cache efficiency, and memory utilization. Statistical analysis demonstrates that the issue of imbalanced thread execution time dominates other factors in determining the overall graph processing time. Moreover, Corder achieves remarkable advantages in cross-platform scalability and reordering overhead.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2021.3105323</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-3392-8388</orcidid><orcidid>https://orcid.org/0000-0002-8704-9821</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1045-9219
ispartof IEEE transactions on parallel and distributed systems, 2022-05, Vol.33 (5), p.1231-1245
issn 1045-9219
1558-2183
language eng
recordid cdi_crossref_primary_10_1109_TPDS_2021_3105323
source IEEE Electronic Library Online
subjects Apexes
Blogs
cache locality
Data structures
graph processing
Instruction sets
Lightweight
Load balancing
Multicore processing
Multicore system
Optimization
Parallel processing
Performance evaluation
Social networking (online)
Sorting
Statistical analysis
Workload
workload balance
Workloads
title Workload Balancing via Graph Reordering on Multicore Systems
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T06%3A54%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Workload%20Balancing%20via%20Graph%20Reordering%20on%20Multicore%20Systems&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Chen,%20YuAng&rft.date=2022-05-01&rft.volume=33&rft.issue=5&rft.spage=1231&rft.epage=1245&rft.pages=1231-1245&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2021.3105323&rft_dat=%3Cproquest_RIE%3E2583636685%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2583636685&rft_id=info:pmid/&rft_ieee_id=9516878&rfr_iscdi=true