Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs

Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effecti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Electronics (Basel) 2024-01, Vol.13 (1), p.192
Hauptverfasser: Chen, Ziyang, Zhuang, Junhao, Wang, Xuan, Tang, Xian, Yang, Kun, Du, Ming, Zhou, Junfeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 1
container_start_page 192
container_title Electronics (Basel)
container_volume 13
creator Chen, Ziyang
Zhuang, Junhao
Wang, Xuan
Tang, Xian
Yang, Kun
Du, Ming
Zhou, Junfeng
description Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×.
doi_str_mv 10.3390/electronics13010192
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2912642040</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A779132756</galeid><sourcerecordid>A779132756</sourcerecordid><originalsourceid>FETCH-LOGICAL-c311t-b4f0d808e1222ed8ea1cc1a6f04f411804acc45bdb44ae8e824e9356d46aeffd3</originalsourceid><addsrcrecordid>eNptUMtOwzAQtBBIVKVfwMUS5xS_msexVFCQKoGUciVynHXjksSp7R769xjKgQO7h90d7cxIg9AtJXPOC3IPHajg7GCUp5xQQgt2gSaMZEVSsIJd_tmv0cz7PYlVUJ5zMkEfWzsmn3jt5Nji0vSmk86EEy5BOtXiZbez8W57_CA9NNgOeNWapDwcpQNcBhmMD9EYmwG_OVvL2nQ_yFnR36ArLTsPs985Re9Pj9vVc7J5Xb-slptEcUpDUgtNmpzkQBlj0OQgqVJUppoILSjNiZBKiUXd1EJIyCFnAgq-SBuRStC64VN0d9YdnT0cwYdqb49uiJYVKyhLBSOCxK_5-WsnO6jMoG1wUsVuoDfKDqBNxJdZFtNh2SKNBH4mKGe9d6Cr0ZleulNFSfUdfvVP-PwLlOB7Cg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2912642040</pqid></control><display><type>article</type><title>Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Chen, Ziyang ; Zhuang, Junhao ; Wang, Xuan ; Tang, Xian ; Yang, Kun ; Du, Ming ; Zhou, Junfeng</creator><creatorcontrib>Chen, Ziyang ; Zhuang, Junhao ; Wang, Xuan ; Tang, Xian ; Yang, Kun ; Du, Ming ; Zhou, Junfeng</creatorcontrib><description>Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics13010192</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Algorithms ; Analysis ; Chi-square test ; Colds ; Communication networks ; Datasets ; Effectiveness ; Efficiency ; Graph theory ; Graphs ; Labels ; Malaria ; Mathematical analysis ; Pattern recognition ; Probability ; Search algorithms ; Search theory ; Similarity ; Social networks ; Statistics</subject><ispartof>Electronics (Basel), 2024-01, Vol.13 (1), p.192</ispartof><rights>COPYRIGHT 2024 MDPI AG</rights><rights>2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c311t-b4f0d808e1222ed8ea1cc1a6f04f411804acc45bdb44ae8e824e9356d46aeffd3</cites><orcidid>0000-0001-6494-5319</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Chen, Ziyang</creatorcontrib><creatorcontrib>Zhuang, Junhao</creatorcontrib><creatorcontrib>Wang, Xuan</creatorcontrib><creatorcontrib>Tang, Xian</creatorcontrib><creatorcontrib>Yang, Kun</creatorcontrib><creatorcontrib>Du, Ming</creatorcontrib><creatorcontrib>Zhou, Junfeng</creatorcontrib><title>Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs</title><title>Electronics (Basel)</title><description>Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Chi-square test</subject><subject>Colds</subject><subject>Communication networks</subject><subject>Datasets</subject><subject>Effectiveness</subject><subject>Efficiency</subject><subject>Graph theory</subject><subject>Graphs</subject><subject>Labels</subject><subject>Malaria</subject><subject>Mathematical analysis</subject><subject>Pattern recognition</subject><subject>Probability</subject><subject>Search algorithms</subject><subject>Search theory</subject><subject>Similarity</subject><subject>Social networks</subject><subject>Statistics</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNptUMtOwzAQtBBIVKVfwMUS5xS_msexVFCQKoGUciVynHXjksSp7R769xjKgQO7h90d7cxIg9AtJXPOC3IPHajg7GCUp5xQQgt2gSaMZEVSsIJd_tmv0cz7PYlVUJ5zMkEfWzsmn3jt5Nji0vSmk86EEy5BOtXiZbez8W57_CA9NNgOeNWapDwcpQNcBhmMD9EYmwG_OVvL2nQ_yFnR36ArLTsPs985Re9Pj9vVc7J5Xb-slptEcUpDUgtNmpzkQBlj0OQgqVJUppoILSjNiZBKiUXd1EJIyCFnAgq-SBuRStC64VN0d9YdnT0cwYdqb49uiJYVKyhLBSOCxK_5-WsnO6jMoG1wUsVuoDfKDqBNxJdZFtNh2SKNBH4mKGe9d6Cr0ZleulNFSfUdfvVP-PwLlOB7Cg</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Chen, Ziyang</creator><creator>Zhuang, Junhao</creator><creator>Wang, Xuan</creator><creator>Tang, Xian</creator><creator>Yang, Kun</creator><creator>Du, Ming</creator><creator>Zhou, Junfeng</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0001-6494-5319</orcidid></search><sort><creationdate>20240101</creationdate><title>Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs</title><author>Chen, Ziyang ; Zhuang, Junhao ; Wang, Xuan ; Tang, Xian ; Yang, Kun ; Du, Ming ; Zhou, Junfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c311t-b4f0d808e1222ed8ea1cc1a6f04f411804acc45bdb44ae8e824e9356d46aeffd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Chi-square test</topic><topic>Colds</topic><topic>Communication networks</topic><topic>Datasets</topic><topic>Effectiveness</topic><topic>Efficiency</topic><topic>Graph theory</topic><topic>Graphs</topic><topic>Labels</topic><topic>Malaria</topic><topic>Mathematical analysis</topic><topic>Pattern recognition</topic><topic>Probability</topic><topic>Search algorithms</topic><topic>Search theory</topic><topic>Similarity</topic><topic>Social networks</topic><topic>Statistics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Ziyang</creatorcontrib><creatorcontrib>Zhuang, Junhao</creatorcontrib><creatorcontrib>Wang, Xuan</creatorcontrib><creatorcontrib>Tang, Xian</creatorcontrib><creatorcontrib>Yang, Kun</creatorcontrib><creatorcontrib>Du, Ming</creatorcontrib><creatorcontrib>Zhou, Junfeng</creatorcontrib><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Ziyang</au><au>Zhuang, Junhao</au><au>Wang, Xuan</au><au>Tang, Xian</au><au>Yang, Kun</au><au>Du, Ming</au><au>Zhou, Junfeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs</atitle><jtitle>Electronics (Basel)</jtitle><date>2024-01-01</date><risdate>2024</risdate><volume>13</volume><issue>1</issue><spage>192</spage><pages>192-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics13010192</doi><orcidid>https://orcid.org/0000-0001-6494-5319</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2079-9292
ispartof Electronics (Basel), 2024-01, Vol.13 (1), p.192
issn 2079-9292
2079-9292
language eng
recordid cdi_proquest_journals_2912642040
source MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
subjects Algorithms
Analysis
Chi-square test
Colds
Communication networks
Datasets
Effectiveness
Efficiency
Graph theory
Graphs
Labels
Malaria
Mathematical analysis
Pattern recognition
Probability
Search algorithms
Search theory
Similarity
Social networks
Statistics
title Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T14%3A02%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Top-k%20Graph%20Similarity%20Search%20Algorithm%20Based%20on%20Chi-Square%20Statistics%20in%20Probabilistic%20Graphs&rft.jtitle=Electronics%20(Basel)&rft.au=Chen,%20Ziyang&rft.date=2024-01-01&rft.volume=13&rft.issue=1&rft.spage=192&rft.pages=192-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics13010192&rft_dat=%3Cgale_proqu%3EA779132756%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2912642040&rft_id=info:pmid/&rft_galeid=A779132756&rfr_iscdi=true