Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs

Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effecti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Electronics (Basel) 2024-01, Vol.13 (1), p.192
Hauptverfasser:	Chen, Ziyang, Zhuang, Junhao, Wang, Xuan, Tang, Xian, Yang, Kun, Du, Ming, Zhou, Junfeng
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analysis Chi-square test Colds Communication networks Datasets Effectiveness Efficiency Graph theory Graphs Labels Malaria Mathematical analysis Pattern recognition Probability Search algorithms Search theory Similarity Social networks Statistics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Top-k graph similarity search on probabilistic graphs is widely used in various scenarios, such as symptom–disease diagnostics, community discovery, visual pattern recognition, and communication networks. The state-of-the-art method uses the chi-square statistics to speed up the process. The effectiveness of the chi-square statistics solution depends on the effectiveness of the sample observation and expectation. The existing method assumes that the labels in the data graphs are subject to uniform distribution and calculate the chi-square value based on this. In fact, however, the actual distribution of the labels does not meet the requirement of uniform distribution, resulting in a low quality of the returned results. To solve this problem, we propose a top-k similar subgraph search algorithm ChiSSA based on chi-square statistics. We propose two ways to calculate the expectation vector according to the actual distribution of labels in the graph, including the local expectation calculation method based on the vertex neighbors and the global expectation calculation method based on the label distribution of the whole graph. Furthermore, we propose two optimization strategies to improve the accuracy of query results and the efficiency of our algorithm. We conduct rich experiments on real datasets. The experimental results on real datasets show that our algorithm improves the quality and accuracy by an average of 1.66× and 1.68× in terms of time overhead, it improves by an average of 3.41×.
ISSN:	2079-9292 2079-9292
DOI:	10.3390/electronics13010192