Isomorphic Graph Embedding for Progressive Maximal Frequent Subgraph Mining

Maximal frequent subgraph mining (MFSM) is the task of mining only maximal frequent subgraphs, i.e., subgraphs that are not a part of other frequent subgraphs. Although many intelligent systems require MFSM, MFSM is challenging compared to frequent subgraph mining (FSM), as maximal frequent subgraph...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on intelligent systems and technology 2023-12, Vol.15 (1), p.1-26, Article 9
Hauptverfasser: Nguyen, Thanh Toan, Nguyen, Thanh Tam, Nguyen, Thanh Hung, Yin, Hongzhi, Nguyen, Thanh Thi, Jo, Jun, Nguyen, Quoc Viet Hung
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Maximal frequent subgraph mining (MFSM) is the task of mining only maximal frequent subgraphs, i.e., subgraphs that are not a part of other frequent subgraphs. Although many intelligent systems require MFSM, MFSM is challenging compared to frequent subgraph mining (FSM), as maximal frequent subgraphs lie in the middle of graph lattice, and FSM algorithms must explore an exponential space and an NP-hard subroutine of frequency counting. Different from prior research, which primarily focused on optimal solutions, we introduce pmMine, a progressive graph neural framework designed for MFSM in a single large graph to attain an approximate solution. The framework combines isomorphic graph embedding, non-parametric partitioning, and an efficiently top-down pattern searching strategy. The critical insight that makes pmMine work is to define the concepts of rooted subgraph and isomorphic graph embedding, in which the costly isomorphism subroutine can be efficiently performed using similarity estimation in embedding space. In addition, pmMine returns the patterns identified during the mining process in a progressive manner. We validate the efficiency and effectiveness of our technique through extensive experiments on a variety of datasets spanning various domains.
ISSN:2157-6904
2157-6912
DOI:10.1145/3630635