Frequent pattern discovery with tri-partition alphabets
•A general and flexible type of sequence pattern called tri-pattern is introduced.•The relationships among the new type and four existing types are analyzed.•An efficient Apriori algorithm for frequent tri-patterns discovery is designed.•Comparison on three application areas shows the superiority of...
Gespeichert in:
Veröffentlicht in: | Information sciences 2020-01, Vol.507, p.715-732 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A general and flexible type of sequence pattern called tri-pattern is introduced.•The relationships among the new type and four existing types are analyzed.•An efficient Apriori algorithm for frequent tri-patterns discovery is designed.•Comparison on three application areas shows the superiority of tri-patterns.
The concept of patterns is the basis of sequence analysis. There are various pattern definitions for biological data, texts, and time series. Inspired by the methodology of three-way decisions and protein tri-partition, this paper proposes a frequent pattern discovery algorithm for a new type of pattern by dividing the alphabet into strong, medium, and weak parts. The new type, called a tri-pattern, is more general and flexible than existing ones and is therefore more interesting in applications. Experiments were undertaken on data in various fields to reveal the universality of this new pattern. These include protein sequence mining, petroleum production time series analysis, and forged Chinese text keyword mining. The results show that tri-patterns are more meaningful and desirable than the existing four types of patterns. This study enriches the semantics of sequential pattern discovery and the application fields of three-way decisions. |
---|---|
ISSN: | 0020-0255 1872-6291 |
DOI: | 10.1016/j.ins.2018.04.013 |