Efficiently Approximating Top- k Sequential Patterns in Transactional Graphs

In real-world networks, an edge (or vertex) may be associated with a transaction database, where each transaction consists of a set of items to describe the attributes or behaviors of the edge (or vertex). This type of graph is named as a transactional graph, and the problem of mining sequential pat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2019, Vol.7, p.62817-62832
Hauptverfasser: Lei, Mingtao, Zhang, Xi, Yang, Jincui, Fang, Binxing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In real-world networks, an edge (or vertex) may be associated with a transaction database, where each transaction consists of a set of items to describe the attributes or behaviors of the edge (or vertex). This type of graph is named as a transactional graph, and the problem of mining sequential patterns from transactional graphs is to discover the subsequences of transaction sequences that frequently appear in many paths of the graph. Solving this problem can facilitate broad applications ranging from social network analysis to urban computing. However, this task is not trivial due to a large number of sequences existed in the graph. In particular, each path of the graph may induce multiple sequences of transactions, and gathering the transaction sequences induced by all the paths may lead to an extremely large volume. To efficiently approximate the top- k patterns, we propose a Parallelized Sampling-based Approach For Mining Top- k Sequential Patterns, PSMSP , which involves two key techniques: (a) a parallelized unbiased sequence sampling approach based on a sequence balanced graph partition strategy, and (b) a novel PSP-Tree structure to efficiently mine the patterns based on the anti-monotonicity properties. The experimental results on synthetic and real-world datasets demonstrate that the PSMSP can successfully find the sequential patterns with superior efficiency.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2019.2916811