Workload Balancing via Graph Reordering on Multicore Systems
In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimi...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2022-05, Vol.33 (5), p.1231-1245 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1245 |
---|---|
container_issue | 5 |
container_start_page | 1231 |
container_title | IEEE transactions on parallel and distributed systems |
container_volume | 33 |
creator | Chen, YuAng Chung, Yeh-Ching |
description | In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimization of cache locality. However, there is a compromise between cache locality and workload balance. Little insight has been devoted into the issue of workload imbalance for the underlying multicore system, which degrades the effectiveness of parallel graph processing. In this work, a measurement approach is proposed to quantify the imbalance incurred by the concentration of vertices. Inspired by it, we present Cache-aware Reorder (Corder) , a lightweight reordering method exploiting the cache hierarchy of multicore systems. At the shared-memory level, Corder promotes even distribution of computation loads amongst multicores. At the private-cache level, Corder facilitates cache efficiency by applying further refinement to local vertex order. Comprehensive performance evaluation of Corder is conducted on various graph applications and datasets. Experimental results show that Corder yields speedup of up to 2.59\times 2.59× and on average 1.45\times 1.45× , which significantly outperforms existing lightweight reordering methods. To identify the root causes of performance boost delivered by Corder, multicore activities are investigated in terms of thread behavior, cache efficiency, and memory utilization. Statistical analysis demonstrates that the issue of imbalanced thread execution time dominates other factors in determining the overall graph processing time. Moreover, Corder achieves remarkable advantages in cross-platform scalability and reordering overhead. |
doi_str_mv | 10.1109/TPDS.2021.3105323 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPDS_2021_3105323</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9516878</ieee_id><sourcerecordid>2583636685</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-c25f21f7906c85bc7252bcec750f972ac85f34a97616d059c1dcf001be19bd03</originalsourceid><addsrcrecordid>eNo9kE9LAzEQxYMoWKsfQLwseN6aSTbZBLz4twoVxRY8hmw20a3bTU22Qr-9WVq8zAyP92aGH0LngCcAWF4t3u7nE4IJTChgRgk9QCNgTOQEBD1MMy5YLgnIY3QS4xJjKBguRuj6w4fv1us6u9Wt7kzTfWa_jc6mQa-_snfrQ23DIPoue9m0fWN8sNl8G3u7iqfoyOk22rN9H6PF48Pi7imfvU6f725muSEF61NljoArJeZGsMqUhJHKWFMy7GRJdBIdLbQsOfAaM2mgNi59WFmQVY3pGF3u1q6D_9nY2Kul34QuXVSECcop54IlF-xcJvgYg3VqHZqVDlsFWA2M1MBIDYzUnlHKXOwyjbX23y8ZcFEK-gcbtmGz</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2583636685</pqid></control><display><type>article</type><title>Workload Balancing via Graph Reordering on Multicore Systems</title><source>IEEE Electronic Library Online</source><creator>Chen, YuAng ; Chung, Yeh-Ching</creator><creatorcontrib>Chen, YuAng ; Chung, Yeh-Ching</creatorcontrib><description><![CDATA[In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimization of cache locality. However, there is a compromise between cache locality and workload balance. Little insight has been devoted into the issue of workload imbalance for the underlying multicore system, which degrades the effectiveness of parallel graph processing. In this work, a measurement approach is proposed to quantify the imbalance incurred by the concentration of vertices. Inspired by it, we present Cache-aware Reorder (Corder) , a lightweight reordering method exploiting the cache hierarchy of multicore systems. At the shared-memory level, Corder promotes even distribution of computation loads amongst multicores. At the private-cache level, Corder facilitates cache efficiency by applying further refinement to local vertex order. Comprehensive performance evaluation of Corder is conducted on various graph applications and datasets. Experimental results show that Corder yields speedup of up to <inline-formula><tex-math notation="LaTeX">2.59\times</tex-math> <mml:math><mml:mrow><mml:mn>2</mml:mn><mml:mo>.</mml:mo><mml:mn>59</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq1-3105323.gif"/> </inline-formula> and on average <inline-formula><tex-math notation="LaTeX">1.45\times</tex-math> <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>45</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq2-3105323.gif"/> </inline-formula>, which significantly outperforms existing lightweight reordering methods. To identify the root causes of performance boost delivered by Corder, multicore activities are investigated in terms of thread behavior, cache efficiency, and memory utilization. Statistical analysis demonstrates that the issue of imbalanced thread execution time dominates other factors in determining the overall graph processing time. Moreover, Corder achieves remarkable advantages in cross-platform scalability and reordering overhead.]]></description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2021.3105323</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Apexes ; Blogs ; cache locality ; Data structures ; graph processing ; Instruction sets ; Lightweight ; Load balancing ; Multicore processing ; Multicore system ; Optimization ; Parallel processing ; Performance evaluation ; Social networking (online) ; Sorting ; Statistical analysis ; Workload ; workload balance ; Workloads</subject><ispartof>IEEE transactions on parallel and distributed systems, 2022-05, Vol.33 (5), p.1231-1245</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-c25f21f7906c85bc7252bcec750f972ac85f34a97616d059c1dcf001be19bd03</cites><orcidid>0000-0002-3392-8388 ; 0000-0002-8704-9821</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9516878$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9516878$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chen, YuAng</creatorcontrib><creatorcontrib>Chung, Yeh-Ching</creatorcontrib><title>Workload Balancing via Graph Reordering on Multicore Systems</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description><![CDATA[In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimization of cache locality. However, there is a compromise between cache locality and workload balance. Little insight has been devoted into the issue of workload imbalance for the underlying multicore system, which degrades the effectiveness of parallel graph processing. In this work, a measurement approach is proposed to quantify the imbalance incurred by the concentration of vertices. Inspired by it, we present Cache-aware Reorder (Corder) , a lightweight reordering method exploiting the cache hierarchy of multicore systems. At the shared-memory level, Corder promotes even distribution of computation loads amongst multicores. At the private-cache level, Corder facilitates cache efficiency by applying further refinement to local vertex order. Comprehensive performance evaluation of Corder is conducted on various graph applications and datasets. Experimental results show that Corder yields speedup of up to <inline-formula><tex-math notation="LaTeX">2.59\times</tex-math> <mml:math><mml:mrow><mml:mn>2</mml:mn><mml:mo>.</mml:mo><mml:mn>59</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq1-3105323.gif"/> </inline-formula> and on average <inline-formula><tex-math notation="LaTeX">1.45\times</tex-math> <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>45</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq2-3105323.gif"/> </inline-formula>, which significantly outperforms existing lightweight reordering methods. To identify the root causes of performance boost delivered by Corder, multicore activities are investigated in terms of thread behavior, cache efficiency, and memory utilization. Statistical analysis demonstrates that the issue of imbalanced thread execution time dominates other factors in determining the overall graph processing time. Moreover, Corder achieves remarkable advantages in cross-platform scalability and reordering overhead.]]></description><subject>Apexes</subject><subject>Blogs</subject><subject>cache locality</subject><subject>Data structures</subject><subject>graph processing</subject><subject>Instruction sets</subject><subject>Lightweight</subject><subject>Load balancing</subject><subject>Multicore processing</subject><subject>Multicore system</subject><subject>Optimization</subject><subject>Parallel processing</subject><subject>Performance evaluation</subject><subject>Social networking (online)</subject><subject>Sorting</subject><subject>Statistical analysis</subject><subject>Workload</subject><subject>workload balance</subject><subject>Workloads</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE9LAzEQxYMoWKsfQLwseN6aSTbZBLz4twoVxRY8hmw20a3bTU22Qr-9WVq8zAyP92aGH0LngCcAWF4t3u7nE4IJTChgRgk9QCNgTOQEBD1MMy5YLgnIY3QS4xJjKBguRuj6w4fv1us6u9Wt7kzTfWa_jc6mQa-_snfrQ23DIPoue9m0fWN8sNl8G3u7iqfoyOk22rN9H6PF48Pi7imfvU6f725muSEF61NljoArJeZGsMqUhJHKWFMy7GRJdBIdLbQsOfAaM2mgNi59WFmQVY3pGF3u1q6D_9nY2Kul34QuXVSECcop54IlF-xcJvgYg3VqHZqVDlsFWA2M1MBIDYzUnlHKXOwyjbX23y8ZcFEK-gcbtmGz</recordid><startdate>20220501</startdate><enddate>20220501</enddate><creator>Chen, YuAng</creator><creator>Chung, Yeh-Ching</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-3392-8388</orcidid><orcidid>https://orcid.org/0000-0002-8704-9821</orcidid></search><sort><creationdate>20220501</creationdate><title>Workload Balancing via Graph Reordering on Multicore Systems</title><author>Chen, YuAng ; Chung, Yeh-Ching</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-c25f21f7906c85bc7252bcec750f972ac85f34a97616d059c1dcf001be19bd03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Apexes</topic><topic>Blogs</topic><topic>cache locality</topic><topic>Data structures</topic><topic>graph processing</topic><topic>Instruction sets</topic><topic>Lightweight</topic><topic>Load balancing</topic><topic>Multicore processing</topic><topic>Multicore system</topic><topic>Optimization</topic><topic>Parallel processing</topic><topic>Performance evaluation</topic><topic>Social networking (online)</topic><topic>Sorting</topic><topic>Statistical analysis</topic><topic>Workload</topic><topic>workload balance</topic><topic>Workloads</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, YuAng</creatorcontrib><creatorcontrib>Chung, Yeh-Ching</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, YuAng</au><au>Chung, Yeh-Ching</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Workload Balancing via Graph Reordering on Multicore Systems</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2022-05-01</date><risdate>2022</risdate><volume>33</volume><issue>5</issue><spage>1231</spage><epage>1245</epage><pages>1231-1245</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract><![CDATA[In a shared-memory multicore system, the intrinsic irregular data structure of graphs leads to poor cache utilization, and therefore deteriorates the performance of graph analytics. To address the problem, prior works have proposed a variety of lightweight reordering methods with focus on the optimization of cache locality. However, there is a compromise between cache locality and workload balance. Little insight has been devoted into the issue of workload imbalance for the underlying multicore system, which degrades the effectiveness of parallel graph processing. In this work, a measurement approach is proposed to quantify the imbalance incurred by the concentration of vertices. Inspired by it, we present Cache-aware Reorder (Corder) , a lightweight reordering method exploiting the cache hierarchy of multicore systems. At the shared-memory level, Corder promotes even distribution of computation loads amongst multicores. At the private-cache level, Corder facilitates cache efficiency by applying further refinement to local vertex order. Comprehensive performance evaluation of Corder is conducted on various graph applications and datasets. Experimental results show that Corder yields speedup of up to <inline-formula><tex-math notation="LaTeX">2.59\times</tex-math> <mml:math><mml:mrow><mml:mn>2</mml:mn><mml:mo>.</mml:mo><mml:mn>59</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq1-3105323.gif"/> </inline-formula> and on average <inline-formula><tex-math notation="LaTeX">1.45\times</tex-math> <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>45</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="chung-ieq2-3105323.gif"/> </inline-formula>, which significantly outperforms existing lightweight reordering methods. To identify the root causes of performance boost delivered by Corder, multicore activities are investigated in terms of thread behavior, cache efficiency, and memory utilization. Statistical analysis demonstrates that the issue of imbalanced thread execution time dominates other factors in determining the overall graph processing time. Moreover, Corder achieves remarkable advantages in cross-platform scalability and reordering overhead.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2021.3105323</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-3392-8388</orcidid><orcidid>https://orcid.org/0000-0002-8704-9821</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1045-9219 |
ispartof | IEEE transactions on parallel and distributed systems, 2022-05, Vol.33 (5), p.1231-1245 |
issn | 1045-9219 1558-2183 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TPDS_2021_3105323 |
source | IEEE Electronic Library Online |
subjects | Apexes Blogs cache locality Data structures graph processing Instruction sets Lightweight Load balancing Multicore processing Multicore system Optimization Parallel processing Performance evaluation Social networking (online) Sorting Statistical analysis Workload workload balance Workloads |
title | Workload Balancing via Graph Reordering on Multicore Systems |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T06%3A54%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Workload%20Balancing%20via%20Graph%20Reordering%20on%20Multicore%20Systems&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Chen,%20YuAng&rft.date=2022-05-01&rft.volume=33&rft.issue=5&rft.spage=1231&rft.epage=1245&rft.pages=1231-1245&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2021.3105323&rft_dat=%3Cproquest_RIE%3E2583636685%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2583636685&rft_id=info:pmid/&rft_ieee_id=9516878&rfr_iscdi=true |