Efficient and effective (k,P)-core-based community search over attributed heterogeneous information networks

Given a heterogeneous information network (HIN) G and a query node q, community search (CS) over an HIN identifies a cohesive subgraph from G that contains q. Although HINs with attributes on nodes (called AHINs) are prevalent today, the CS over AHINs (CS-AHIN) is ignored in the literature. Though w...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information sciences 2024-03, Vol.661, p.120076, Article 120076
Hauptverfasser: Wang, Yuxiang, Gu, Chengjie, Xu, Xiaoliang, Zeng, Xinjun, Ke, Xiangyu, Wu, Tianxing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Given a heterogeneous information network (HIN) G and a query node q, community search (CS) over an HIN identifies a cohesive subgraph from G that contains q. Although HINs with attributes on nodes (called AHINs) are prevalent today, the CS over AHINs (CS-AHIN) is ignored in the literature. Though we can convert an AHIN to an attributed homogeneous graph given a meta-path, then apply the CS approaches for attributed homogeneous graphs to solve CS-AHIN, it is problematic for two reasons. (1) Complete graph conversion is time-consuming and unnecessary, because the search only involves the query node's neighborhood, not the entire graph. (2) Existing attribute cohesiveness metrics are not strict enough to reflect substantial similarities among the community's pairwise nodes. To resolve this, we define the CS-AHIN problem atop a strict attribute cohesiveness metric that supports textual and numerical attributes simultaneously. We show the problem is NP-hard. To address it, we propose an exact baseline to return the global optimal result. Then, we propose three heuristic algorithms using a general greedy search framework to speed up the efficiency. Moreover, we present a cohesiveness-aware proximity graph-based index to boost the performance. Comprehensive experimental studies on various real-world datasets demonstrate our method's superiority. •We define a CS-AHIN problem based on (k,P)-core model, which is NP-hard.•We present a general greedy search framework with three efficient heuristic methods.•We improve heuristic methods via a cohesiveness-aware index atop proximity graph.•Extensive experiments show our method's superiority in effectiveness and efficiency.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2023.120076