Four node graphlet and triad enumeration on distributed platforms

Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are lim...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Distributed and parallel databases : an international journal 2022, Vol.40 (2-3), p.335-372
Hauptverfasser:	Santoso, Yudi, Liu, Xiaozhou, Srinivasan, Venkatesh, Thomo, Alex
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer networks Computer Science Data Structures Database Management Distributed processing Enumeration Graph theory Graphs Information Systems Applications (incl.Internet) Memory Structures Operating Systems Special Issue on Scientific and Statistical Data Management in the Age of AI 2021
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	372
container_issue	2-3
container_start_page	335
container_title	Distributed and parallel databases : an international journal
container_volume	40
creator	Santoso, Yudi Liu, Xiaozhou Srinivasan, Venkatesh Thomo, Alex
description	Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of the scale of the graph that they can process. Distributed computing is often proposed as a solution to improve the maximum scale. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs and triads in directed graphs using a distributed platform. We propose an efficient distributed solution that significantly surpasses the existing solutions on the scale and performance. With this method, we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. Our experimental results show that our solution has a strong machine scalability close to one.
doi_str_mv	10.1007/s10619-022-07416-8
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2711891029</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2711891029</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-bac414384a36e3e3e2084375e377b5bd30cff056a80010f379ae21c4515b16363</originalsourceid><addsrcrecordid>eNp9kE9LxDAQxYMouK5-AU8Bz9GZpGnS47K4rrDgRc8hbdO1S_-ZpAe_vdEK3mQGBh7vzQw_Qm4R7hFAPQSEHAsGnDNQGeZMn5EVSiWYkkqfkxUUPIlK80tyFcIJAAqFakU2u3H2dBhrR4_eTu-di9QONY2-tTV1w9w7b2M7DjR13Yakl3N0NZ06G5vR9-GaXDS2C-7md67J2-7xdbtnh5en5-3mwCoOEFlpqwwzoTMrcidScdCZUNIJpUpZ1gKqpgGZWw2A0AhVWMexyiTKEnORizW5W_ZOfvyYXYjmlF4f0knDFaIuEHiRXHxxVX4MwbvGTL7trf80COYblVlQmYTK_KAyOoXEEgrJPByd_1v9T-oLQNBq-A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2711891029</pqid></control><display><type>article</type><title>Four node graphlet and triad enumeration on distributed platforms</title><source>Springer Nature - Complete Springer Journals</source><creator>Santoso, Yudi ; Liu, Xiaozhou ; Srinivasan, Venkatesh ; Thomo, Alex</creator><creatorcontrib>Santoso, Yudi ; Liu, Xiaozhou ; Srinivasan, Venkatesh ; Thomo, Alex</creatorcontrib><description>Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of the scale of the graph that they can process. Distributed computing is often proposed as a solution to improve the maximum scale. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs and triads in directed graphs using a distributed platform. We propose an efficient distributed solution that significantly surpasses the existing solutions on the scale and performance. With this method, we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. Our experimental results show that our solution has a strong machine scalability close to one.</description><identifier>ISSN: 0926-8782</identifier><identifier>EISSN: 1573-7578</identifier><identifier>DOI: 10.1007/s10619-022-07416-8</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Computer networks ; Computer Science ; Data Structures ; Database Management ; Distributed processing ; Enumeration ; Graph theory ; Graphs ; Information Systems Applications (incl.Internet) ; Memory Structures ; Operating Systems ; Special Issue on Scientific and Statistical Data Management in the Age of AI 2021</subject><ispartof>Distributed and parallel databases : an international journal, 2022, Vol.40 (2-3), p.335-372</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-bac414384a36e3e3e2084375e377b5bd30cff056a80010f379ae21c4515b16363</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10619-022-07416-8$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10619-022-07416-8$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Santoso, Yudi</creatorcontrib><creatorcontrib>Liu, Xiaozhou</creatorcontrib><creatorcontrib>Srinivasan, Venkatesh</creatorcontrib><creatorcontrib>Thomo, Alex</creatorcontrib><title>Four node graphlet and triad enumeration on distributed platforms</title><title>Distributed and parallel databases : an international journal</title><addtitle>Distrib Parallel Databases</addtitle><description>Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of the scale of the graph that they can process. Distributed computing is often proposed as a solution to improve the maximum scale. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs and triads in directed graphs using a distributed platform. We propose an efficient distributed solution that significantly surpasses the existing solutions on the scale and performance. With this method, we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. Our experimental results show that our solution has a strong machine scalability close to one.</description><subject>Computer networks</subject><subject>Computer Science</subject><subject>Data Structures</subject><subject>Database Management</subject><subject>Distributed processing</subject><subject>Enumeration</subject><subject>Graph theory</subject><subject>Graphs</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Memory Structures</subject><subject>Operating Systems</subject><subject>Special Issue on Scientific and Statistical Data Management in the Age of AI 2021</subject><issn>0926-8782</issn><issn>1573-7578</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LxDAQxYMouK5-AU8Bz9GZpGnS47K4rrDgRc8hbdO1S_-ZpAe_vdEK3mQGBh7vzQw_Qm4R7hFAPQSEHAsGnDNQGeZMn5EVSiWYkkqfkxUUPIlK80tyFcIJAAqFakU2u3H2dBhrR4_eTu-di9QONY2-tTV1w9w7b2M7DjR13Yakl3N0NZ06G5vR9-GaXDS2C-7md67J2-7xdbtnh5en5-3mwCoOEFlpqwwzoTMrcidScdCZUNIJpUpZ1gKqpgGZWw2A0AhVWMexyiTKEnORizW5W_ZOfvyYXYjmlF4f0knDFaIuEHiRXHxxVX4MwbvGTL7trf80COYblVlQmYTK_KAyOoXEEgrJPByd_1v9T-oLQNBq-A</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Santoso, Yudi</creator><creator>Liu, Xiaozhou</creator><creator>Srinivasan, Venkatesh</creator><creator>Thomo, Alex</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>2022</creationdate><title>Four node graphlet and triad enumeration on distributed platforms</title><author>Santoso, Yudi ; Liu, Xiaozhou ; Srinivasan, Venkatesh ; Thomo, Alex</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-bac414384a36e3e3e2084375e377b5bd30cff056a80010f379ae21c4515b16363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer networks</topic><topic>Computer Science</topic><topic>Data Structures</topic><topic>Database Management</topic><topic>Distributed processing</topic><topic>Enumeration</topic><topic>Graph theory</topic><topic>Graphs</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Memory Structures</topic><topic>Operating Systems</topic><topic>Special Issue on Scientific and Statistical Data Management in the Age of AI 2021</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Santoso, Yudi</creatorcontrib><creatorcontrib>Liu, Xiaozhou</creatorcontrib><creatorcontrib>Srinivasan, Venkatesh</creatorcontrib><creatorcontrib>Thomo, Alex</creatorcontrib><collection>CrossRef</collection><jtitle>Distributed and parallel databases : an international journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Santoso, Yudi</au><au>Liu, Xiaozhou</au><au>Srinivasan, Venkatesh</au><au>Thomo, Alex</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Four node graphlet and triad enumeration on distributed platforms</atitle><jtitle>Distributed and parallel databases : an international journal</jtitle><stitle>Distrib Parallel Databases</stitle><date>2022</date><risdate>2022</risdate><volume>40</volume><issue>2-3</issue><spage>335</spage><epage>372</epage><pages>335-372</pages><issn>0926-8782</issn><eissn>1573-7578</eissn><abstract>Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of the scale of the graph that they can process. Distributed computing is often proposed as a solution to improve the maximum scale. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs and triads in directed graphs using a distributed platform. We propose an efficient distributed solution that significantly surpasses the existing solutions on the scale and performance. With this method, we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. Our experimental results show that our solution has a strong machine scalability close to one.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10619-022-07416-8</doi><tpages>38</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0926-8782
ispartof	Distributed and parallel databases : an international journal, 2022, Vol.40 (2-3), p.335-372
issn	0926-8782 1573-7578
language	eng
recordid	cdi_proquest_journals_2711891029
source	Springer Nature - Complete Springer Journals
subjects	Computer networks Computer Science Data Structures Database Management Distributed processing Enumeration Graph theory Graphs Information Systems Applications (incl.Internet) Memory Structures Operating Systems Special Issue on Scientific and Statistical Data Management in the Age of AI 2021
title	Four node graphlet and triad enumeration on distributed platforms
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T16%3A04%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Four%20node%20graphlet%20and%20triad%20enumeration%20on%20distributed%20platforms&rft.jtitle=Distributed%20and%20parallel%20databases%20:%20an%20international%20journal&rft.au=Santoso,%20Yudi&rft.date=2022&rft.volume=40&rft.issue=2-3&rft.spage=335&rft.epage=372&rft.pages=335-372&rft.issn=0926-8782&rft.eissn=1573-7578&rft_id=info:doi/10.1007/s10619-022-07416-8&rft_dat=%3Cproquest_cross%3E2711891029%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2711891029&rft_id=info:pmid/&rfr_iscdi=true