HE-Gaston algorithm for frequent subgraph mining with hadoop framework

Graph mining contributes a key role in data mining and as the size of the data increases, it becomes complicated. Identifying the interesting subgraphs in the graph is a commonly researched issue, where the subgraphs denote the commonly occurring pattern exhibiting a particular structure. Frequent S...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2024-10, Vol.251, p.123971, Article 123971
Hauptverfasser: Jagannadha Rao, D.B., Kalpana, Parsi, Polepally, Vijayakumar, Nagendra Prabhu, S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Graph mining contributes a key role in data mining and as the size of the data increases, it becomes complicated. Identifying the interesting subgraphs in the graph is a commonly researched issue, where the subgraphs denote the commonly occurring pattern exhibiting a particular structure. Frequent Subgraph Mining (FSM) is an important task for exploratory data analysis on graph data.Though many techniques are proposed for FSM, the large dimension of the data makes FSM complex. This research proposes a novel technique for performing FSM in a Hadoop framework. Here, FSM is carried out using the proposed Holoentropy Gaston algorithm (HE-Gaston), which is developed by incorporating the Holoentropy support measure instead of the Recurrent support measure in the Recurrent-Gaston (R-Gaston) technique. Here, the weblog files are considered for FSM and are fed to the Spark framework, which encompasses a master and several slaves. The slave nodes generate the frequent subgraphs based on the Holoentropy support measure and the generated frequent subgraphs are applied to the master node which produces the final frequent subgraphs by utilizing an aggregate Holoentropy support measure. Further, the HE-Gaston shows that it recorded execution time of 43 ms, memory of 2.161 MB, and number of structures mined as 117.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2024.123971