Four node graphlet and triad enumeration on distributed platforms

Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are lim...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Distributed and parallel databases : an international journal 2022, Vol.40 (2-3), p.335-372
Hauptverfasser: Santoso, Yudi, Liu, Xiaozhou, Srinivasan, Venkatesh, Thomo, Alex
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 372
container_issue 2-3
container_start_page 335
container_title Distributed and parallel databases : an international journal
container_volume 40
creator Santoso, Yudi
Liu, Xiaozhou
Srinivasan, Venkatesh
Thomo, Alex
description Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of the scale of the graph that they can process. Distributed computing is often proposed as a solution to improve the maximum scale. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs and triads in directed graphs using a distributed platform. We propose an efficient distributed solution that significantly surpasses the existing solutions on the scale and performance. With this method, we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. Our experimental results show that our solution has a strong machine scalability close to one.
doi_str_mv 10.1007/s10619-022-07416-8
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2711891029</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2711891029</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-bac414384a36e3e3e2084375e377b5bd30cff056a80010f379ae21c4515b16363</originalsourceid><addsrcrecordid>eNp9kE9LxDAQxYMouK5-AU8Bz9GZpGnS47K4rrDgRc8hbdO1S_-ZpAe_vdEK3mQGBh7vzQw_Qm4R7hFAPQSEHAsGnDNQGeZMn5EVSiWYkkqfkxUUPIlK80tyFcIJAAqFakU2u3H2dBhrR4_eTu-di9QONY2-tTV1w9w7b2M7DjR13Yakl3N0NZ06G5vR9-GaXDS2C-7md67J2-7xdbtnh5en5-3mwCoOEFlpqwwzoTMrcidScdCZUNIJpUpZ1gKqpgGZWw2A0AhVWMexyiTKEnORizW5W_ZOfvyYXYjmlF4f0knDFaIuEHiRXHxxVX4MwbvGTL7trf80COYblVlQmYTK_KAyOoXEEgrJPByd_1v9T-oLQNBq-A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2711891029</pqid></control><display><type>article</type><title>Four node graphlet and triad enumeration on distributed platforms</title><source>Springer Nature - Complete Springer Journals</source><creator>Santoso, Yudi ; Liu, Xiaozhou ; Srinivasan, Venkatesh ; Thomo, Alex</creator><creatorcontrib>Santoso, Yudi ; Liu, Xiaozhou ; Srinivasan, Venkatesh ; Thomo, Alex</creatorcontrib><description>Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of the scale of the graph that they can process. Distributed computing is often proposed as a solution to improve the maximum scale. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs and triads in directed graphs using a distributed platform. We propose an efficient distributed solution that significantly surpasses the existing solutions on the scale and performance. With this method, we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. Our experimental results show that our solution has a strong machine scalability close to one.</description><identifier>ISSN: 0926-8782</identifier><identifier>EISSN: 1573-7578</identifier><identifier>DOI: 10.1007/s10619-022-07416-8</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Computer networks ; Computer Science ; Data Structures ; Database Management ; Distributed processing ; Enumeration ; Graph theory ; Graphs ; Information Systems Applications (incl.Internet) ; Memory Structures ; Operating Systems ; Special Issue on Scientific and Statistical Data Management in the Age of AI 2021</subject><ispartof>Distributed and parallel databases : an international journal, 2022, Vol.40 (2-3), p.335-372</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-bac414384a36e3e3e2084375e377b5bd30cff056a80010f379ae21c4515b16363</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10619-022-07416-8$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10619-022-07416-8$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Santoso, Yudi</creatorcontrib><creatorcontrib>Liu, Xiaozhou</creatorcontrib><creatorcontrib>Srinivasan, Venkatesh</creatorcontrib><creatorcontrib>Thomo, Alex</creatorcontrib><title>Four node graphlet and triad enumeration on distributed platforms</title><title>Distributed and parallel databases : an international journal</title><addtitle>Distrib Parallel Databases</addtitle><description>Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of the scale of the graph that they can process. Distributed computing is often proposed as a solution to improve the maximum scale. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs and triads in directed graphs using a distributed platform. We propose an efficient distributed solution that significantly surpasses the existing solutions on the scale and performance. With this method, we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. Our experimental results show that our solution has a strong machine scalability close to one.</description><subject>Computer networks</subject><subject>Computer Science</subject><subject>Data Structures</subject><subject>Database Management</subject><subject>Distributed processing</subject><subject>Enumeration</subject><subject>Graph theory</subject><subject>Graphs</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Memory Structures</subject><subject>Operating Systems</subject><subject>Special Issue on Scientific and Statistical Data Management in the Age of AI 2021</subject><issn>0926-8782</issn><issn>1573-7578</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LxDAQxYMouK5-AU8Bz9GZpGnS47K4rrDgRc8hbdO1S_-ZpAe_vdEK3mQGBh7vzQw_Qm4R7hFAPQSEHAsGnDNQGeZMn5EVSiWYkkqfkxUUPIlK80tyFcIJAAqFakU2u3H2dBhrR4_eTu-di9QONY2-tTV1w9w7b2M7DjR13Yakl3N0NZ06G5vR9-GaXDS2C-7md67J2-7xdbtnh5en5-3mwCoOEFlpqwwzoTMrcidScdCZUNIJpUpZ1gKqpgGZWw2A0AhVWMexyiTKEnORizW5W_ZOfvyYXYjmlF4f0knDFaIuEHiRXHxxVX4MwbvGTL7trf80COYblVlQmYTK_KAyOoXEEgrJPByd_1v9T-oLQNBq-A</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Santoso, Yudi</creator><creator>Liu, Xiaozhou</creator><creator>Srinivasan, Venkatesh</creator><creator>Thomo, Alex</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>2022</creationdate><title>Four node graphlet and triad enumeration on distributed platforms</title><author>Santoso, Yudi ; Liu, Xiaozhou ; Srinivasan, Venkatesh ; Thomo, Alex</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-bac414384a36e3e3e2084375e377b5bd30cff056a80010f379ae21c4515b16363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer networks</topic><topic>Computer Science</topic><topic>Data Structures</topic><topic>Database Management</topic><topic>Distributed processing</topic><topic>Enumeration</topic><topic>Graph theory</topic><topic>Graphs</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Memory Structures</topic><topic>Operating Systems</topic><topic>Special Issue on Scientific and Statistical Data Management in the Age of AI 2021</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Santoso, Yudi</creatorcontrib><creatorcontrib>Liu, Xiaozhou</creatorcontrib><creatorcontrib>Srinivasan, Venkatesh</creatorcontrib><creatorcontrib>Thomo, Alex</creatorcontrib><collection>CrossRef</collection><jtitle>Distributed and parallel databases : an international journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Santoso, Yudi</au><au>Liu, Xiaozhou</au><au>Srinivasan, Venkatesh</au><au>Thomo, Alex</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Four node graphlet and triad enumeration on distributed platforms</atitle><jtitle>Distributed and parallel databases : an international journal</jtitle><stitle>Distrib Parallel Databases</stitle><date>2022</date><risdate>2022</risdate><volume>40</volume><issue>2-3</issue><spage>335</spage><epage>372</epage><pages>335-372</pages><issn>0926-8782</issn><eissn>1573-7578</eissn><abstract>Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of the scale of the graph that they can process. Distributed computing is often proposed as a solution to improve the maximum scale. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs and triads in directed graphs using a distributed platform. We propose an efficient distributed solution that significantly surpasses the existing solutions on the scale and performance. With this method, we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. Our experimental results show that our solution has a strong machine scalability close to one.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10619-022-07416-8</doi><tpages>38</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0926-8782
ispartof Distributed and parallel databases : an international journal, 2022, Vol.40 (2-3), p.335-372
issn 0926-8782
1573-7578
language eng
recordid cdi_proquest_journals_2711891029
source Springer Nature - Complete Springer Journals
subjects Computer networks
Computer Science
Data Structures
Database Management
Distributed processing
Enumeration
Graph theory
Graphs
Information Systems Applications (incl.Internet)
Memory Structures
Operating Systems
Special Issue on Scientific and Statistical Data Management in the Age of AI 2021
title Four node graphlet and triad enumeration on distributed platforms
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T16%3A04%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Four%20node%20graphlet%20and%20triad%20enumeration%20on%20distributed%20platforms&rft.jtitle=Distributed%20and%20parallel%20databases%20:%20an%20international%20journal&rft.au=Santoso,%20Yudi&rft.date=2022&rft.volume=40&rft.issue=2-3&rft.spage=335&rft.epage=372&rft.pages=335-372&rft.issn=0926-8782&rft.eissn=1573-7578&rft_id=info:doi/10.1007/s10619-022-07416-8&rft_dat=%3Cproquest_cross%3E2711891029%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2711891029&rft_id=info:pmid/&rfr_iscdi=true