Discovering frequent episodes and learning hidden Markov models: a formal connection

This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2005-11, Vol.17 (11), p.1505-1517
Hauptverfasser:	Srivatsan Laxman, Sastry, P.S., Unnikrishnan, K.P.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Application software Applied sciences Computer science Computer science control theory systems Computer simulation Counting Data analysis Data mining Exact sciences and technology Frequency measurement frequent episodes Hidden Markov models Index Terms- Temporal data mining Information systems. Data bases Joints Learning Mathematical models Memory organisation. Data processing Pattern analysis sequential data Software Statistical analysis statistical significance Statistics Stochastic processes Streams Studies Time series analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2005.181