A MapReduce solution for incremental mining of sequential patterns from big data

•Two phase MapReduce algorithm is proposed for incremental mining of sequential patterns.•Backward mining makes use of the knowledge obtained during the previous mining process.•Co-occurrence reverse map data structure efficiently generates the candidate sequences.•Candidate generation rules avoids...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2019-11, Vol.133, p.109-125
Hauptverfasser: Saleti, Sumalatha, R.B.V., Subramanyam
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Two phase MapReduce algorithm is proposed for incremental mining of sequential patterns.•Backward mining makes use of the knowledge obtained during the previous mining process.•Co-occurrence reverse map data structure efficiently generates the candidate sequences.•Candidate generation rules avoids the generation of too many false candidates.•Three novel early prune properties are introduced based on the study of item co-occurrences. Sequential Pattern Mining (SPM) is a popular data mining task with broad applications. With the advent of big data, traditional SPM algorithms are not scalable. Hence, many of the researchers have migrated to big data frameworks such as MapReduce and proposed distributed algorithms. However, the existing MapReduce algorithms assume the data as static and do not handle the incremental database updates. Moreover, they use to re-mine the updated database while new sequences are inserted. In this paper, we propose an efficient distributed algorithm for incremental sequential pattern mining (MR-INCSPM) using the MapReduce framework that can handle big data. The proposed algorithm incorporates the backward mining approach that efficiently makes use of the knowledge obtained during the previous mining process. Also, based on the study of item co-occurrences, we propose Co-occurrence Reverse Map (CRMAP) data structure. The issue of combinatorial explosion of candidate sequences is dealt using the proposed CRMAP data structure. Besides, a novel candidate generation and early prune mechanisms are designed using CRMAP to speed up the mining process. The proposed algorithm is evaluated on both the real and synthetic datasets. The experimental results prove the efficacy of MR-INCSPM with respect to processing time, memory and pruning efficiency.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2019.05.013