Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs
Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effecti...
Gespeichert in:
Veröffentlicht in: | Electronics (Basel) 2024-01, Vol.13 (1), p.192 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 1 |
container_start_page | 192 |
container_title | Electronics (Basel) |
container_volume | 13 |
creator | Chen, Ziyang Zhuang, Junhao Wang, Xuan Tang, Xian Yang, Kun Du, Ming Zhou, Junfeng |
description | Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×. |
doi_str_mv | 10.3390/electronics13010192 |
format | Article |
fullrecord | <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2912642040</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A779132756</galeid><sourcerecordid>A779132756</sourcerecordid><originalsourceid>FETCH-LOGICAL-c311t-b4f0d808e1222ed8ea1cc1a6f04f411804acc45bdb44ae8e824e9356d46aeffd3</originalsourceid><addsrcrecordid>eNptUMtOwzAQtBBIVKVfwMUS5xS_msexVFCQKoGUciVynHXjksSp7R769xjKgQO7h90d7cxIg9AtJXPOC3IPHajg7GCUp5xQQgt2gSaMZEVSsIJd_tmv0cz7PYlVUJ5zMkEfWzsmn3jt5Nji0vSmk86EEy5BOtXiZbez8W57_CA9NNgOeNWapDwcpQNcBhmMD9EYmwG_OVvL2nQ_yFnR36ArLTsPs985Re9Pj9vVc7J5Xb-slptEcUpDUgtNmpzkQBlj0OQgqVJUppoILSjNiZBKiUXd1EJIyCFnAgq-SBuRStC64VN0d9YdnT0cwYdqb49uiJYVKyhLBSOCxK_5-WsnO6jMoG1wUsVuoDfKDqBNxJdZFtNh2SKNBH4mKGe9d6Cr0ZleulNFSfUdfvVP-PwLlOB7Cg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2912642040</pqid></control><display><type>article</type><title>Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Chen, Ziyang ; Zhuang, Junhao ; Wang, Xuan ; Tang, Xian ; Yang, Kun ; Du, Ming ; Zhou, Junfeng</creator><creatorcontrib>Chen, Ziyang ; Zhuang, Junhao ; Wang, Xuan ; Tang, Xian ; Yang, Kun ; Du, Ming ; Zhou, Junfeng</creatorcontrib><description>Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics13010192</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Algorithms ; Analysis ; Chi-square test ; Colds ; Communication networks ; Datasets ; Effectiveness ; Efficiency ; Graph theory ; Graphs ; Labels ; Malaria ; Mathematical analysis ; Pattern recognition ; Probability ; Search algorithms ; Search theory ; Similarity ; Social networks ; Statistics</subject><ispartof>Electronics (Basel), 2024-01, Vol.13 (1), p.192</ispartof><rights>COPYRIGHT 2024 MDPI AG</rights><rights>2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c311t-b4f0d808e1222ed8ea1cc1a6f04f411804acc45bdb44ae8e824e9356d46aeffd3</cites><orcidid>0000-0001-6494-5319</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Chen, Ziyang</creatorcontrib><creatorcontrib>Zhuang, Junhao</creatorcontrib><creatorcontrib>Wang, Xuan</creatorcontrib><creatorcontrib>Tang, Xian</creatorcontrib><creatorcontrib>Yang, Kun</creatorcontrib><creatorcontrib>Du, Ming</creatorcontrib><creatorcontrib>Zhou, Junfeng</creatorcontrib><title>Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs</title><title>Electronics (Basel)</title><description>Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Chi-square test</subject><subject>Colds</subject><subject>Communication networks</subject><subject>Datasets</subject><subject>Effectiveness</subject><subject>Efficiency</subject><subject>Graph theory</subject><subject>Graphs</subject><subject>Labels</subject><subject>Malaria</subject><subject>Mathematical analysis</subject><subject>Pattern recognition</subject><subject>Probability</subject><subject>Search algorithms</subject><subject>Search theory</subject><subject>Similarity</subject><subject>Social networks</subject><subject>Statistics</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNptUMtOwzAQtBBIVKVfwMUS5xS_msexVFCQKoGUciVynHXjksSp7R769xjKgQO7h90d7cxIg9AtJXPOC3IPHajg7GCUp5xQQgt2gSaMZEVSsIJd_tmv0cz7PYlVUJ5zMkEfWzsmn3jt5Nji0vSmk86EEy5BOtXiZbez8W57_CA9NNgOeNWapDwcpQNcBhmMD9EYmwG_OVvL2nQ_yFnR36ArLTsPs985Re9Pj9vVc7J5Xb-slptEcUpDUgtNmpzkQBlj0OQgqVJUppoILSjNiZBKiUXd1EJIyCFnAgq-SBuRStC64VN0d9YdnT0cwYdqb49uiJYVKyhLBSOCxK_5-WsnO6jMoG1wUsVuoDfKDqBNxJdZFtNh2SKNBH4mKGe9d6Cr0ZleulNFSfUdfvVP-PwLlOB7Cg</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Chen, Ziyang</creator><creator>Zhuang, Junhao</creator><creator>Wang, Xuan</creator><creator>Tang, Xian</creator><creator>Yang, Kun</creator><creator>Du, Ming</creator><creator>Zhou, Junfeng</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0001-6494-5319</orcidid></search><sort><creationdate>20240101</creationdate><title>Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs</title><author>Chen, Ziyang ; Zhuang, Junhao ; Wang, Xuan ; Tang, Xian ; Yang, Kun ; Du, Ming ; Zhou, Junfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c311t-b4f0d808e1222ed8ea1cc1a6f04f411804acc45bdb44ae8e824e9356d46aeffd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Chi-square test</topic><topic>Colds</topic><topic>Communication networks</topic><topic>Datasets</topic><topic>Effectiveness</topic><topic>Efficiency</topic><topic>Graph theory</topic><topic>Graphs</topic><topic>Labels</topic><topic>Malaria</topic><topic>Mathematical analysis</topic><topic>Pattern recognition</topic><topic>Probability</topic><topic>Search algorithms</topic><topic>Search theory</topic><topic>Similarity</topic><topic>Social networks</topic><topic>Statistics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Ziyang</creatorcontrib><creatorcontrib>Zhuang, Junhao</creatorcontrib><creatorcontrib>Wang, Xuan</creatorcontrib><creatorcontrib>Tang, Xian</creatorcontrib><creatorcontrib>Yang, Kun</creatorcontrib><creatorcontrib>Du, Ming</creatorcontrib><creatorcontrib>Zhou, Junfeng</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Ziyang</au><au>Zhuang, Junhao</au><au>Wang, Xuan</au><au>Tang, Xian</au><au>Yang, Kun</au><au>Du, Ming</au><au>Zhou, Junfeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs</atitle><jtitle>Electronics (Basel)</jtitle><date>2024-01-01</date><risdate>2024</risdate><volume>13</volume><issue>1</issue><spage>192</spage><pages>192-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics13010192</doi><orcidid>https://orcid.org/0000-0001-6494-5319</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2079-9292 |
ispartof | Electronics (Basel), 2024-01, Vol.13 (1), p.192 |
issn | 2079-9292 2079-9292 |
language | eng |
recordid | cdi_proquest_journals_2912642040 |
source | MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals |
subjects | Algorithms Analysis Chi-square test Colds Communication networks Datasets Effectiveness Efficiency Graph theory Graphs Labels Malaria Mathematical analysis Pattern recognition Probability Search algorithms Search theory Similarity Social networks Statistics |
title | Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T14%3A02%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Top-k%20Graph%20Similarity%20Search%20Algorithm%20Based%20on%20Chi-Square%20Statistics%20in%20Probabilistic%20Graphs&rft.jtitle=Electronics%20(Basel)&rft.au=Chen,%20Ziyang&rft.date=2024-01-01&rft.volume=13&rft.issue=1&rft.spage=192&rft.pages=192-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics13010192&rft_dat=%3Cgale_proqu%3EA779132756%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2912642040&rft_id=info:pmid/&rft_galeid=A779132756&rfr_iscdi=true |