Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs

Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effecti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Electronics (Basel) 2024-01, Vol.13 (1), p.192
Hauptverfasser:	Chen, Ziyang, Zhuang, Junhao, Wang, Xuan, Tang, Xian, Yang, Kun, Du, Ming, Zhou, Junfeng
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analysis Chi-square test Colds Communication networks Datasets Effectiveness Efficiency Graph theory Graphs Labels Malaria Mathematical analysis Pattern recognition Probability Search algorithms Search theory Similarity Social networks Statistics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	1
container_start_page	192
container_title	Electronics (Basel)
container_volume	13
creator	Chen, Ziyang Zhuang, Junhao Wang, Xuan Tang, Xian Yang, Kun Du, Ming Zhou, Junfeng
description	Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×.
doi_str_mv	10.3390/electronics13010192
format	Article
fullrecord	<record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2912642040</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A779132756</galeid><sourcerecordid>A779132756</sourcerecordid><originalsourceid>FETCH-LOGICAL-c311t-b4f0d808e1222ed8ea1cc1a6f04f411804acc45bdb44ae8e824e9356d46aeffd3</originalsourceid><addsrcrecordid>eNptUMtOwzAQtBBIVKVfwMUS5xS_msexVFCQKoGUciVynHXjksSp7R769xjKgQO7h90d7cxIg9AtJXPOC3IPHajg7GCUp5xQQgt2gSaMZEVSsIJd_tmv0cz7PYlVUJ5zMkEfWzsmn3jt5Nji0vSmk86EEy5BOtXiZbez8W57_CA9NNgOeNWapDwcpQNcBhmMD9EYmwG_OVvL2nQ_yFnR36ArLTsPs985Re9Pj9vVc7J5Xb-slptEcUpDUgtNmpzkQBlj0OQgqVJUppoILSjNiZBKiUXd1EJIyCFnAgq-SBuRStC64VN0d9YdnT0cwYdqb49uiJYVKyhLBSOCxK_5-WsnO6jMoG1wUsVuoDfKDqBNxJdZFtNh2SKNBH4mKGe9d6Cr0ZleulNFSfUdfvVP-PwLlOB7Cg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2912642040</pqid></control><display><type>article</type><title>Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Chen, Ziyang ; Zhuang, Junhao ; Wang, Xuan ; Tang, Xian ; Yang, Kun ; Du, Ming ; Zhou, Junfeng</creator><creatorcontrib>Chen, Ziyang ; Zhuang, Junhao ; Wang, Xuan ; Tang, Xian ; Yang, Kun ; Du, Ming ; Zhou, Junfeng</creatorcontrib><description>Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics13010192</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Algorithms ; Analysis ; Chi-square test ; Colds ; Communication networks ; Datasets ; Effectiveness ; Efficiency ; Graph theory ; Graphs ; Labels ; Malaria ; Mathematical analysis ; Pattern recognition ; Probability ; Search algorithms ; Search theory ; Similarity ; Social networks ; Statistics</subject><ispartof>Electronics (Basel), 2024-01, Vol.13 (1), p.192</ispartof><rights>COPYRIGHT 2024 MDPI AG</rights><rights>2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c311t-b4f0d808e1222ed8ea1cc1a6f04f411804acc45bdb44ae8e824e9356d46aeffd3</cites><orcidid>0000-0001-6494-5319</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Chen, Ziyang</creatorcontrib><creatorcontrib>Zhuang, Junhao</creatorcontrib><creatorcontrib>Wang, Xuan</creatorcontrib><creatorcontrib>Tang, Xian</creatorcontrib><creatorcontrib>Yang, Kun</creatorcontrib><creatorcontrib>Du, Ming</creatorcontrib><creatorcontrib>Zhou, Junfeng</creatorcontrib><title>Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs</title><title>Electronics (Basel)</title><description>Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Chi-square test</subject><subject>Colds</subject><subject>Communication networks</subject><subject>Datasets</subject><subject>Effectiveness</subject><subject>Efficiency</subject><subject>Graph theory</subject><subject>Graphs</subject><subject>Labels</subject><subject>Malaria</subject><subject>Mathematical analysis</subject><subject>Pattern recognition</subject><subject>Probability</subject><subject>Search algorithms</subject><subject>Search theory</subject><subject>Similarity</subject><subject>Social networks</subject><subject>Statistics</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNptUMtOwzAQtBBIVKVfwMUS5xS_msexVFCQKoGUciVynHXjksSp7R769xjKgQO7h90d7cxIg9AtJXPOC3IPHajg7GCUp5xQQgt2gSaMZEVSsIJd_tmv0cz7PYlVUJ5zMkEfWzsmn3jt5Nji0vSmk86EEy5BOtXiZbez8W57_CA9NNgOeNWapDwcpQNcBhmMD9EYmwG_OVvL2nQ_yFnR36ArLTsPs985Re9Pj9vVc7J5Xb-slptEcUpDUgtNmpzkQBlj0OQgqVJUppoILSjNiZBKiUXd1EJIyCFnAgq-SBuRStC64VN0d9YdnT0cwYdqb49uiJYVKyhLBSOCxK_5-WsnO6jMoG1wUsVuoDfKDqBNxJdZFtNh2SKNBH4mKGe9d6Cr0ZleulNFSfUdfvVP-PwLlOB7Cg</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Chen, Ziyang</creator><creator>Zhuang, Junhao</creator><creator>Wang, Xuan</creator><creator>Tang, Xian</creator><creator>Yang, Kun</creator><creator>Du, Ming</creator><creator>Zhou, Junfeng</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0001-6494-5319</orcidid></search><sort><creationdate>20240101</creationdate><title>Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs</title><author>Chen, Ziyang ; Zhuang, Junhao ; Wang, Xuan ; Tang, Xian ; Yang, Kun ; Du, Ming ; Zhou, Junfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c311t-b4f0d808e1222ed8ea1cc1a6f04f411804acc45bdb44ae8e824e9356d46aeffd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Chi-square test</topic><topic>Colds</topic><topic>Communication networks</topic><topic>Datasets</topic><topic>Effectiveness</topic><topic>Efficiency</topic><topic>Graph theory</topic><topic>Graphs</topic><topic>Labels</topic><topic>Malaria</topic><topic>Mathematical analysis</topic><topic>Pattern recognition</topic><topic>Probability</topic><topic>Search algorithms</topic><topic>Search theory</topic><topic>Similarity</topic><topic>Social networks</topic><topic>Statistics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Ziyang</creatorcontrib><creatorcontrib>Zhuang, Junhao</creatorcontrib><creatorcontrib>Wang, Xuan</creatorcontrib><creatorcontrib>Tang, Xian</creatorcontrib><creatorcontrib>Yang, Kun</creatorcontrib><creatorcontrib>Du, Ming</creatorcontrib><creatorcontrib>Zhou, Junfeng</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Ziyang</au><au>Zhuang, Junhao</au><au>Wang, Xuan</au><au>Tang, Xian</au><au>Yang, Kun</au><au>Du, Ming</au><au>Zhou, Junfeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs</atitle><jtitle>Electronics (Basel)</jtitle><date>2024-01-01</date><risdate>2024</risdate><volume>13</volume><issue>1</issue><spage>192</spage><pages>192-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics13010192</doi><orcidid>https://orcid.org/0000-0001-6494-5319</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2079-9292
ispartof	Electronics (Basel), 2024-01, Vol.13 (1), p.192
issn	2079-9292 2079-9292
language	eng
recordid	cdi_proquest_journals_2912642040
source	MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
subjects	Algorithms Analysis Chi-square test Colds Communication networks Datasets Effectiveness Efficiency Graph theory Graphs Labels Malaria Mathematical analysis Pattern recognition Probability Search algorithms Search theory Similarity Social networks Statistics
title	Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T14%3A02%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Top-k%20Graph%20Similarity%20Search%20Algorithm%20Based%20on%20Chi-Square%20Statistics%20in%20Probabilistic%20Graphs&rft.jtitle=Electronics%20(Basel)&rft.au=Chen,%20Ziyang&rft.date=2024-01-01&rft.volume=13&rft.issue=1&rft.spage=192&rft.pages=192-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics13010192&rft_dat=%3Cgale_proqu%3EA779132756%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2912642040&rft_id=info:pmid/&rft_galeid=A779132756&rfr_iscdi=true