Scalable GPU graph traversal

Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrate...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SIGPLAN notices 2012-08, Vol.47 (8), p.117-128
Hauptverfasser:	Merrill, Duane, Garland, Michael, Grimshaw, Andrew
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	128
container_issue	8
container_start_page	117
container_title	SIGPLAN notices
container_volume	47
creator	Merrill, Duane Garland, Michael Grimshaw, Andrew
description	Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter. We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O (\| V \|+\| E \|) work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations, respectively. This level of performance is several times faster than state-of-the-art implementations both CPU and GPU platforms.
doi_str_mv	10.1145/2370036.2145832
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_2370036_2145832</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_2370036_2145832</sourcerecordid><originalsourceid>FETCH-LOGICAL-c348t-a94ef186778c400823d11e1337bcd1ecdd1f5a2cd86615f5f2f9b81ed8e42a113</originalsourceid><addsrcrecordid>eNotj0tLAzEUhYMoOLau3biYP5D23twkc7uUoq1QULBdh0wePhixJEXw3zvirA6H83HgE-IGYYGozVJRB0B2ocbCpM5Eg8awRLRwLppxURJJw6W4qvUDRhQUN-L2JfjB90NqN8-H9rX441t7Kv47leqHubjIfqjpesqZODzc79dbuXvaPK7vdjKQ5pP0K50ysu06DhqAFUXEhERdHyKmECNm41WIbC2abLLKq54xRU5aeUSaieX_byhftZaU3bG8f_ry4xDcn5yb5NwkR7-cgD-N</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Scalable GPU graph traversal</title><source>ACM Digital Library Complete</source><creator>Merrill, Duane ; Garland, Michael ; Grimshaw, Andrew</creator><creatorcontrib>Merrill, Duane ; Garland, Michael ; Grimshaw, Andrew</creatorcontrib><description>Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter. We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O (\| V \|+\| E \|) work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations, respectively. This level of performance is several times faster than state-of-the-art implementations both CPU and GPU platforms.</description><identifier>ISSN: 0362-1340</identifier><identifier>EISSN: 1558-1160</identifier><identifier>DOI: 10.1145/2370036.2145832</identifier><language>eng</language><ispartof>SIGPLAN notices, 2012-08, Vol.47 (8), p.117-128</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c348t-a94ef186778c400823d11e1337bcd1ecdd1f5a2cd86615f5f2f9b81ed8e42a113</citedby><cites>FETCH-LOGICAL-c348t-a94ef186778c400823d11e1337bcd1ecdd1f5a2cd86615f5f2f9b81ed8e42a113</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Merrill, Duane</creatorcontrib><creatorcontrib>Garland, Michael</creatorcontrib><creatorcontrib>Grimshaw, Andrew</creatorcontrib><title>Scalable GPU graph traversal</title><title>SIGPLAN notices</title><description>Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter. We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O (\| V \|+\| E \|) work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations, respectively. This level of performance is several times faster than state-of-the-art implementations both CPU and GPU platforms.</description><issn>0362-1340</issn><issn>1558-1160</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNotj0tLAzEUhYMoOLau3biYP5D23twkc7uUoq1QULBdh0wePhixJEXw3zvirA6H83HgE-IGYYGozVJRB0B2ocbCpM5Eg8awRLRwLppxURJJw6W4qvUDRhQUN-L2JfjB90NqN8-H9rX441t7Kv47leqHubjIfqjpesqZODzc79dbuXvaPK7vdjKQ5pP0K50ysu06DhqAFUXEhERdHyKmECNm41WIbC2abLLKq54xRU5aeUSaieX_byhftZaU3bG8f_ry4xDcn5yb5NwkR7-cgD-N</recordid><startdate>20120801</startdate><enddate>20120801</enddate><creator>Merrill, Duane</creator><creator>Garland, Michael</creator><creator>Grimshaw, Andrew</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20120801</creationdate><title>Scalable GPU graph traversal</title><author>Merrill, Duane ; Garland, Michael ; Grimshaw, Andrew</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c348t-a94ef186778c400823d11e1337bcd1ecdd1f5a2cd86615f5f2f9b81ed8e42a113</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Merrill, Duane</creatorcontrib><creatorcontrib>Garland, Michael</creatorcontrib><creatorcontrib>Grimshaw, Andrew</creatorcontrib><collection>CrossRef</collection><jtitle>SIGPLAN notices</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Merrill, Duane</au><au>Garland, Michael</au><au>Grimshaw, Andrew</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Scalable GPU graph traversal</atitle><jtitle>SIGPLAN notices</jtitle><date>2012-08-01</date><risdate>2012</risdate><volume>47</volume><issue>8</issue><spage>117</spage><epage>128</epage><pages>117-128</pages><issn>0362-1340</issn><eissn>1558-1160</eissn><abstract>Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter. We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O (\| V \|+\| E \|) work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations, respectively. This level of performance is several times faster than state-of-the-art implementations both CPU and GPU platforms.</abstract><doi>10.1145/2370036.2145832</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0362-1340
ispartof	SIGPLAN notices, 2012-08, Vol.47 (8), p.117-128
issn	0362-1340 1558-1160
language	eng
recordid	cdi_crossref_primary_10_1145_2370036_2145832
source	ACM Digital Library Complete
title	Scalable GPU graph traversal
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T16%3A35%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Scalable%20GPU%20graph%20traversal&rft.jtitle=SIGPLAN%20notices&rft.au=Merrill,%20Duane&rft.date=2012-08-01&rft.volume=47&rft.issue=8&rft.spage=117&rft.epage=128&rft.pages=117-128&rft.issn=0362-1340&rft.eissn=1558-1160&rft_id=info:doi/10.1145/2370036.2145832&rft_dat=%3Ccrossref%3E10_1145_2370036_2145832%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true