Discovering frequent episodes and learning hidden Markov models: a formal connection
This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on knowledge and data engineering 2005-11, Vol.17 (11), p.1505-1517 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1517 |
---|---|
container_issue | 11 |
container_start_page | 1505 |
container_title | IEEE transactions on knowledge and data engineering |
container_volume | 17 |
creator | Srivatsan Laxman Sastry, P.S. Unnikrishnan, K.P. |
description | This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery. |
doi_str_mv | 10.1109/TKDE.2005.181 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TKDE_2005_181</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1512036</ieee_id><sourcerecordid>896204825</sourcerecordid><originalsourceid>FETCH-LOGICAL-c447t-6bd7d81f43f55d3daa99ecc621da9eceaadfd0f88ec009e440b1d997599e85153</originalsourceid><addsrcrecordid>eNqF0U1vGyEQBuBV1Uh10xxz6gVVSntah2HBQG9RnI8qiXJxzgjD0GyyBhfsSP33ZWVLkXpITozEwwyjt2mOgU4BqD5d3MwvpoxSMQUFH5oJCKFaBho-1ppyaHnH5afmcylPlFIlFUyaxbwvLr1g7uNvEjL-2WLcEFz3JXksxEZPBrQ5jtePvfcYyZ3Nz-mFrCoYyk9iSUh5ZQfiUozoNn2KX5qDYIeCR_vzsHm4vFicX7e391e_zs9uW8e53LSzpZdeQeBdEMJ33lqt0bkZA29rgdb64GlQCh2lGjmnS_BaS1GZEiC6w-bHru86p_rxsjGrug0Og42YtsUoPWOUKzbK729KphlwUN37UGpdZ0OF3_6DT2mbY13XaGBUcilZRe0OuZxKyRjMOvcrm_8aoGbMzIyZmTEzUzOr_mTf1BZnh5BtdH15fSShhqZVdV93rkfE12tRB3ez7h-UOZ89</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>912074772</pqid></control><display><type>article</type><title>Discovering frequent episodes and learning hidden Markov models: a formal connection</title><source>IEEE Electronic Library (IEL)</source><creator>Srivatsan Laxman ; Sastry, P.S. ; Unnikrishnan, K.P.</creator><creatorcontrib>Srivatsan Laxman ; Sastry, P.S. ; Unnikrishnan, K.P.</creatorcontrib><description>This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2005.181</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York, NY: IEEE</publisher><subject>Algorithms ; Application software ; Applied sciences ; Computer science ; Computer science; control theory; systems ; Computer simulation ; Counting ; Data analysis ; Data mining ; Exact sciences and technology ; Frequency measurement ; frequent episodes ; Hidden Markov models ; Index Terms- Temporal data mining ; Information systems. Data bases ; Joints ; Learning ; Mathematical models ; Memory organisation. Data processing ; Pattern analysis ; sequential data ; Software ; Statistical analysis ; statistical significance ; Statistics ; Stochastic processes ; Streams ; Studies ; Time series analysis</subject><ispartof>IEEE transactions on knowledge and data engineering, 2005-11, Vol.17 (11), p.1505-1517</ispartof><rights>2006 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2005</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c447t-6bd7d81f43f55d3daa99ecc621da9eceaadfd0f88ec009e440b1d997599e85153</citedby><cites>FETCH-LOGICAL-c447t-6bd7d81f43f55d3daa99ecc621da9eceaadfd0f88ec009e440b1d997599e85153</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1512036$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1512036$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=17178198$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Srivatsan Laxman</creatorcontrib><creatorcontrib>Sastry, P.S.</creatorcontrib><creatorcontrib>Unnikrishnan, K.P.</creatorcontrib><title>Discovering frequent episodes and learning hidden Markov models: a formal connection</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.</description><subject>Algorithms</subject><subject>Application software</subject><subject>Applied sciences</subject><subject>Computer science</subject><subject>Computer science; control theory; systems</subject><subject>Computer simulation</subject><subject>Counting</subject><subject>Data analysis</subject><subject>Data mining</subject><subject>Exact sciences and technology</subject><subject>Frequency measurement</subject><subject>frequent episodes</subject><subject>Hidden Markov models</subject><subject>Index Terms- Temporal data mining</subject><subject>Information systems. Data bases</subject><subject>Joints</subject><subject>Learning</subject><subject>Mathematical models</subject><subject>Memory organisation. Data processing</subject><subject>Pattern analysis</subject><subject>sequential data</subject><subject>Software</subject><subject>Statistical analysis</subject><subject>statistical significance</subject><subject>Statistics</subject><subject>Stochastic processes</subject><subject>Streams</subject><subject>Studies</subject><subject>Time series analysis</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqF0U1vGyEQBuBV1Uh10xxz6gVVSntah2HBQG9RnI8qiXJxzgjD0GyyBhfsSP33ZWVLkXpITozEwwyjt2mOgU4BqD5d3MwvpoxSMQUFH5oJCKFaBho-1ppyaHnH5afmcylPlFIlFUyaxbwvLr1g7uNvEjL-2WLcEFz3JXksxEZPBrQ5jtePvfcYyZ3Nz-mFrCoYyk9iSUh5ZQfiUozoNn2KX5qDYIeCR_vzsHm4vFicX7e391e_zs9uW8e53LSzpZdeQeBdEMJ33lqt0bkZA29rgdb64GlQCh2lGjmnS_BaS1GZEiC6w-bHru86p_rxsjGrug0Og42YtsUoPWOUKzbK729KphlwUN37UGpdZ0OF3_6DT2mbY13XaGBUcilZRe0OuZxKyRjMOvcrm_8aoGbMzIyZmTEzUzOr_mTf1BZnh5BtdH15fSShhqZVdV93rkfE12tRB3ez7h-UOZ89</recordid><startdate>20051101</startdate><enddate>20051101</enddate><creator>Srivatsan Laxman</creator><creator>Sastry, P.S.</creator><creator>Unnikrishnan, K.P.</creator><general>IEEE</general><general>IEEE Computer Society</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7TB</scope><scope>FR3</scope><scope>F28</scope></search><sort><creationdate>20051101</creationdate><title>Discovering frequent episodes and learning hidden Markov models: a formal connection</title><author>Srivatsan Laxman ; Sastry, P.S. ; Unnikrishnan, K.P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c447t-6bd7d81f43f55d3daa99ecc621da9eceaadfd0f88ec009e440b1d997599e85153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Algorithms</topic><topic>Application software</topic><topic>Applied sciences</topic><topic>Computer science</topic><topic>Computer science; control theory; systems</topic><topic>Computer simulation</topic><topic>Counting</topic><topic>Data analysis</topic><topic>Data mining</topic><topic>Exact sciences and technology</topic><topic>Frequency measurement</topic><topic>frequent episodes</topic><topic>Hidden Markov models</topic><topic>Index Terms- Temporal data mining</topic><topic>Information systems. Data bases</topic><topic>Joints</topic><topic>Learning</topic><topic>Mathematical models</topic><topic>Memory organisation. Data processing</topic><topic>Pattern analysis</topic><topic>sequential data</topic><topic>Software</topic><topic>Statistical analysis</topic><topic>statistical significance</topic><topic>Statistics</topic><topic>Stochastic processes</topic><topic>Streams</topic><topic>Studies</topic><topic>Time series analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Srivatsan Laxman</creatorcontrib><creatorcontrib>Sastry, P.S.</creatorcontrib><creatorcontrib>Unnikrishnan, K.P.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Engineering Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Srivatsan Laxman</au><au>Sastry, P.S.</au><au>Unnikrishnan, K.P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Discovering frequent episodes and learning hidden Markov models: a formal connection</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2005-11-01</date><risdate>2005</risdate><volume>17</volume><issue>11</issue><spage>1505</spage><epage>1517</epage><pages>1505-1517</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.</abstract><cop>New York, NY</cop><pub>IEEE</pub><doi>10.1109/TKDE.2005.181</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1041-4347 |
ispartof | IEEE transactions on knowledge and data engineering, 2005-11, Vol.17 (11), p.1505-1517 |
issn | 1041-4347 1558-2191 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TKDE_2005_181 |
source | IEEE Electronic Library (IEL) |
subjects | Algorithms Application software Applied sciences Computer science Computer science control theory systems Computer simulation Counting Data analysis Data mining Exact sciences and technology Frequency measurement frequent episodes Hidden Markov models Index Terms- Temporal data mining Information systems. Data bases Joints Learning Mathematical models Memory organisation. Data processing Pattern analysis sequential data Software Statistical analysis statistical significance Statistics Stochastic processes Streams Studies Time series analysis |
title | Discovering frequent episodes and learning hidden Markov models: a formal connection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T19%3A47%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Discovering%20frequent%20episodes%20and%20learning%20hidden%20Markov%20models:%20a%20formal%20connection&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Srivatsan%20Laxman&rft.date=2005-11-01&rft.volume=17&rft.issue=11&rft.spage=1505&rft.epage=1517&rft.pages=1505-1517&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2005.181&rft_dat=%3Cproquest_RIE%3E896204825%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=912074772&rft_id=info:pmid/&rft_ieee_id=1512036&rfr_iscdi=true |