Discovering frequent episodes and learning hidden Markov models: a formal connection

This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2005-11, Vol.17 (11), p.1505-1517
Hauptverfasser: Srivatsan Laxman, Sastry, P.S., Unnikrishnan, K.P.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1517
container_issue 11
container_start_page 1505
container_title IEEE transactions on knowledge and data engineering
container_volume 17
creator Srivatsan Laxman
Sastry, P.S.
Unnikrishnan, K.P.
description This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.
doi_str_mv 10.1109/TKDE.2005.181
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TKDE_2005_181</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1512036</ieee_id><sourcerecordid>896204825</sourcerecordid><originalsourceid>FETCH-LOGICAL-c447t-6bd7d81f43f55d3daa99ecc621da9eceaadfd0f88ec009e440b1d997599e85153</originalsourceid><addsrcrecordid>eNqF0U1vGyEQBuBV1Uh10xxz6gVVSntah2HBQG9RnI8qiXJxzgjD0GyyBhfsSP33ZWVLkXpITozEwwyjt2mOgU4BqD5d3MwvpoxSMQUFH5oJCKFaBho-1ppyaHnH5afmcylPlFIlFUyaxbwvLr1g7uNvEjL-2WLcEFz3JXksxEZPBrQ5jtePvfcYyZ3Nz-mFrCoYyk9iSUh5ZQfiUozoNn2KX5qDYIeCR_vzsHm4vFicX7e391e_zs9uW8e53LSzpZdeQeBdEMJ33lqt0bkZA29rgdb64GlQCh2lGjmnS_BaS1GZEiC6w-bHru86p_rxsjGrug0Og42YtsUoPWOUKzbK729KphlwUN37UGpdZ0OF3_6DT2mbY13XaGBUcilZRe0OuZxKyRjMOvcrm_8aoGbMzIyZmTEzUzOr_mTf1BZnh5BtdH15fSShhqZVdV93rkfE12tRB3ez7h-UOZ89</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>912074772</pqid></control><display><type>article</type><title>Discovering frequent episodes and learning hidden Markov models: a formal connection</title><source>IEEE Electronic Library (IEL)</source><creator>Srivatsan Laxman ; Sastry, P.S. ; Unnikrishnan, K.P.</creator><creatorcontrib>Srivatsan Laxman ; Sastry, P.S. ; Unnikrishnan, K.P.</creatorcontrib><description>This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2005.181</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York, NY: IEEE</publisher><subject>Algorithms ; Application software ; Applied sciences ; Computer science ; Computer science; control theory; systems ; Computer simulation ; Counting ; Data analysis ; Data mining ; Exact sciences and technology ; Frequency measurement ; frequent episodes ; Hidden Markov models ; Index Terms- Temporal data mining ; Information systems. Data bases ; Joints ; Learning ; Mathematical models ; Memory organisation. Data processing ; Pattern analysis ; sequential data ; Software ; Statistical analysis ; statistical significance ; Statistics ; Stochastic processes ; Streams ; Studies ; Time series analysis</subject><ispartof>IEEE transactions on knowledge and data engineering, 2005-11, Vol.17 (11), p.1505-1517</ispartof><rights>2006 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2005</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c447t-6bd7d81f43f55d3daa99ecc621da9eceaadfd0f88ec009e440b1d997599e85153</citedby><cites>FETCH-LOGICAL-c447t-6bd7d81f43f55d3daa99ecc621da9eceaadfd0f88ec009e440b1d997599e85153</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1512036$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1512036$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=17178198$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Srivatsan Laxman</creatorcontrib><creatorcontrib>Sastry, P.S.</creatorcontrib><creatorcontrib>Unnikrishnan, K.P.</creatorcontrib><title>Discovering frequent episodes and learning hidden Markov models: a formal connection</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.</description><subject>Algorithms</subject><subject>Application software</subject><subject>Applied sciences</subject><subject>Computer science</subject><subject>Computer science; control theory; systems</subject><subject>Computer simulation</subject><subject>Counting</subject><subject>Data analysis</subject><subject>Data mining</subject><subject>Exact sciences and technology</subject><subject>Frequency measurement</subject><subject>frequent episodes</subject><subject>Hidden Markov models</subject><subject>Index Terms- Temporal data mining</subject><subject>Information systems. Data bases</subject><subject>Joints</subject><subject>Learning</subject><subject>Mathematical models</subject><subject>Memory organisation. Data processing</subject><subject>Pattern analysis</subject><subject>sequential data</subject><subject>Software</subject><subject>Statistical analysis</subject><subject>statistical significance</subject><subject>Statistics</subject><subject>Stochastic processes</subject><subject>Streams</subject><subject>Studies</subject><subject>Time series analysis</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqF0U1vGyEQBuBV1Uh10xxz6gVVSntah2HBQG9RnI8qiXJxzgjD0GyyBhfsSP33ZWVLkXpITozEwwyjt2mOgU4BqD5d3MwvpoxSMQUFH5oJCKFaBho-1ppyaHnH5afmcylPlFIlFUyaxbwvLr1g7uNvEjL-2WLcEFz3JXksxEZPBrQ5jtePvfcYyZ3Nz-mFrCoYyk9iSUh5ZQfiUozoNn2KX5qDYIeCR_vzsHm4vFicX7e391e_zs9uW8e53LSzpZdeQeBdEMJ33lqt0bkZA29rgdb64GlQCh2lGjmnS_BaS1GZEiC6w-bHru86p_rxsjGrug0Og42YtsUoPWOUKzbK729KphlwUN37UGpdZ0OF3_6DT2mbY13XaGBUcilZRe0OuZxKyRjMOvcrm_8aoGbMzIyZmTEzUzOr_mTf1BZnh5BtdH15fSShhqZVdV93rkfE12tRB3ez7h-UOZ89</recordid><startdate>20051101</startdate><enddate>20051101</enddate><creator>Srivatsan Laxman</creator><creator>Sastry, P.S.</creator><creator>Unnikrishnan, K.P.</creator><general>IEEE</general><general>IEEE Computer Society</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7TB</scope><scope>FR3</scope><scope>F28</scope></search><sort><creationdate>20051101</creationdate><title>Discovering frequent episodes and learning hidden Markov models: a formal connection</title><author>Srivatsan Laxman ; Sastry, P.S. ; Unnikrishnan, K.P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c447t-6bd7d81f43f55d3daa99ecc621da9eceaadfd0f88ec009e440b1d997599e85153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Algorithms</topic><topic>Application software</topic><topic>Applied sciences</topic><topic>Computer science</topic><topic>Computer science; control theory; systems</topic><topic>Computer simulation</topic><topic>Counting</topic><topic>Data analysis</topic><topic>Data mining</topic><topic>Exact sciences and technology</topic><topic>Frequency measurement</topic><topic>frequent episodes</topic><topic>Hidden Markov models</topic><topic>Index Terms- Temporal data mining</topic><topic>Information systems. Data bases</topic><topic>Joints</topic><topic>Learning</topic><topic>Mathematical models</topic><topic>Memory organisation. Data processing</topic><topic>Pattern analysis</topic><topic>sequential data</topic><topic>Software</topic><topic>Statistical analysis</topic><topic>statistical significance</topic><topic>Statistics</topic><topic>Stochastic processes</topic><topic>Streams</topic><topic>Studies</topic><topic>Time series analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Srivatsan Laxman</creatorcontrib><creatorcontrib>Sastry, P.S.</creatorcontrib><creatorcontrib>Unnikrishnan, K.P.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Engineering Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Srivatsan Laxman</au><au>Sastry, P.S.</au><au>Unnikrishnan, K.P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Discovering frequent episodes and learning hidden Markov models: a formal connection</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2005-11-01</date><risdate>2005</risdate><volume>17</volume><issue>11</issue><spage>1505</spage><epage>1517</epage><pages>1505-1517</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.</abstract><cop>New York, NY</cop><pub>IEEE</pub><doi>10.1109/TKDE.2005.181</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1041-4347
ispartof IEEE transactions on knowledge and data engineering, 2005-11, Vol.17 (11), p.1505-1517
issn 1041-4347
1558-2191
language eng
recordid cdi_crossref_primary_10_1109_TKDE_2005_181
source IEEE Electronic Library (IEL)
subjects Algorithms
Application software
Applied sciences
Computer science
Computer science
control theory
systems
Computer simulation
Counting
Data analysis
Data mining
Exact sciences and technology
Frequency measurement
frequent episodes
Hidden Markov models
Index Terms- Temporal data mining
Information systems. Data bases
Joints
Learning
Mathematical models
Memory organisation. Data processing
Pattern analysis
sequential data
Software
Statistical analysis
statistical significance
Statistics
Stochastic processes
Streams
Studies
Time series analysis
title Discovering frequent episodes and learning hidden Markov models: a formal connection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T19%3A47%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Discovering%20frequent%20episodes%20and%20learning%20hidden%20Markov%20models:%20a%20formal%20connection&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Srivatsan%20Laxman&rft.date=2005-11-01&rft.volume=17&rft.issue=11&rft.spage=1505&rft.epage=1517&rft.pages=1505-1517&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2005.181&rft_dat=%3Cproquest_RIE%3E896204825%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=912074772&rft_id=info:pmid/&rft_ieee_id=1512036&rfr_iscdi=true