Discovering frequent episodes and learning hidden Markov models: a formal connection

This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2005-11, Vol.17 (11), p.1505-1517
Hauptverfasser:	Srivatsan Laxman, Sastry, P.S., Unnikrishnan, K.P.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Application software Applied sciences Computer science Computer science control theory systems Computer simulation Counting Data analysis Data mining Exact sciences and technology Frequency measurement frequent episodes Hidden Markov models Index Terms- Temporal data mining Information systems. Data bases Joints Learning Mathematical models Memory organisation. Data processing Pattern analysis sequential data Software Statistical analysis statistical significance Statistics Stochastic processes Streams Studies Time series analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1517
container_issue	11
container_start_page	1505
container_title	IEEE transactions on knowledge and data engineering
container_volume	17
creator	Srivatsan Laxman Sastry, P.S. Unnikrishnan, K.P.
description	This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.
doi_str_mv	10.1109/TKDE.2005.181
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TKDE_2005_181</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1512036</ieee_id><sourcerecordid>896204825</sourcerecordid><originalsourceid>FETCH-LOGICAL-c447t-6bd7d81f43f55d3daa99ecc621da9eceaadfd0f88ec009e440b1d997599e85153</originalsourceid><addsrcrecordid>eNqF0U1vGyEQBuBV1Uh10xxz6gVVSntah2HBQG9RnI8qiXJxzgjD0GyyBhfsSP33ZWVLkXpITozEwwyjt2mOgU4BqD5d3MwvpoxSMQUFH5oJCKFaBho-1ppyaHnH5afmcylPlFIlFUyaxbwvLr1g7uNvEjL-2WLcEFz3JXksxEZPBrQ5jtePvfcYyZ3Nz-mFrCoYyk9iSUh5ZQfiUozoNn2KX5qDYIeCR_vzsHm4vFicX7e391e_zs9uW8e53LSzpZdeQeBdEMJ33lqt0bkZA29rgdb64GlQCh2lGjmnS_BaS1GZEiC6w-bHru86p_rxsjGrug0Og42YtsUoPWOUKzbK729KphlwUN37UGpdZ0OF3_6DT2mbY13XaGBUcilZRe0OuZxKyRjMOvcrm_8aoGbMzIyZmTEzUzOr_mTf1BZnh5BtdH15fSShhqZVdV93rkfE12tRB3ez7h-UOZ89</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>912074772</pqid></control><display><type>article</type><title>Discovering frequent episodes and learning hidden Markov models: a formal connection</title><source>IEEE Electronic Library (IEL)</source><creator>Srivatsan Laxman ; Sastry, P.S. ; Unnikrishnan, K.P.</creator><creatorcontrib>Srivatsan Laxman ; Sastry, P.S. ; Unnikrishnan, K.P.</creatorcontrib><description>This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2005.181</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York, NY: IEEE</publisher><subject>Algorithms ; Application software ; Applied sciences ; Computer science ; Computer science; control theory; systems ; Computer simulation ; Counting ; Data analysis ; Data mining ; Exact sciences and technology ; Frequency measurement ; frequent episodes ; Hidden Markov models ; Index Terms- Temporal data mining ; Information systems. Data bases ; Joints ; Learning ; Mathematical models ; Memory organisation. Data processing ; Pattern analysis ; sequential data ; Software ; Statistical analysis ; statistical significance ; Statistics ; Stochastic processes ; Streams ; Studies ; Time series analysis</subject><ispartof>IEEE transactions on knowledge and data engineering, 2005-11, Vol.17 (11), p.1505-1517</ispartof><rights>2006 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2005</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c447t-6bd7d81f43f55d3daa99ecc621da9eceaadfd0f88ec009e440b1d997599e85153</citedby><cites>FETCH-LOGICAL-c447t-6bd7d81f43f55d3daa99ecc621da9eceaadfd0f88ec009e440b1d997599e85153</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1512036$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1512036$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=17178198$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Srivatsan Laxman</creatorcontrib><creatorcontrib>Sastry, P.S.</creatorcontrib><creatorcontrib>Unnikrishnan, K.P.</creatorcontrib><title>Discovering frequent episodes and learning hidden Markov models: a formal connection</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.</description><subject>Algorithms</subject><subject>Application software</subject><subject>Applied sciences</subject><subject>Computer science</subject><subject>Computer science; control theory; systems</subject><subject>Computer simulation</subject><subject>Counting</subject><subject>Data analysis</subject><subject>Data mining</subject><subject>Exact sciences and technology</subject><subject>Frequency measurement</subject><subject>frequent episodes</subject><subject>Hidden Markov models</subject><subject>Index Terms- Temporal data mining</subject><subject>Information systems. Data bases</subject><subject>Joints</subject><subject>Learning</subject><subject>Mathematical models</subject><subject>Memory organisation. Data processing</subject><subject>Pattern analysis</subject><subject>sequential data</subject><subject>Software</subject><subject>Statistical analysis</subject><subject>statistical significance</subject><subject>Statistics</subject><subject>Stochastic processes</subject><subject>Streams</subject><subject>Studies</subject><subject>Time series analysis</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqF0U1vGyEQBuBV1Uh10xxz6gVVSntah2HBQG9RnI8qiXJxzgjD0GyyBhfsSP33ZWVLkXpITozEwwyjt2mOgU4BqD5d3MwvpoxSMQUFH5oJCKFaBho-1ppyaHnH5afmcylPlFIlFUyaxbwvLr1g7uNvEjL-2WLcEFz3JXksxEZPBrQ5jtePvfcYyZ3Nz-mFrCoYyk9iSUh5ZQfiUozoNn2KX5qDYIeCR_vzsHm4vFicX7e391e_zs9uW8e53LSzpZdeQeBdEMJ33lqt0bkZA29rgdb64GlQCh2lGjmnS_BaS1GZEiC6w-bHru86p_rxsjGrug0Og42YtsUoPWOUKzbK729KphlwUN37UGpdZ0OF3_6DT2mbY13XaGBUcilZRe0OuZxKyRjMOvcrm_8aoGbMzIyZmTEzUzOr_mTf1BZnh5BtdH15fSShhqZVdV93rkfE12tRB3ez7h-UOZ89</recordid><startdate>20051101</startdate><enddate>20051101</enddate><creator>Srivatsan Laxman</creator><creator>Sastry, P.S.</creator><creator>Unnikrishnan, K.P.</creator><general>IEEE</general><general>IEEE Computer Society</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7TB</scope><scope>FR3</scope><scope>F28</scope></search><sort><creationdate>20051101</creationdate><title>Discovering frequent episodes and learning hidden Markov models: a formal connection</title><author>Srivatsan Laxman ; Sastry, P.S. ; Unnikrishnan, K.P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c447t-6bd7d81f43f55d3daa99ecc621da9eceaadfd0f88ec009e440b1d997599e85153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Algorithms</topic><topic>Application software</topic><topic>Applied sciences</topic><topic>Computer science</topic><topic>Computer science; control theory; systems</topic><topic>Computer simulation</topic><topic>Counting</topic><topic>Data analysis</topic><topic>Data mining</topic><topic>Exact sciences and technology</topic><topic>Frequency measurement</topic><topic>frequent episodes</topic><topic>Hidden Markov models</topic><topic>Index Terms- Temporal data mining</topic><topic>Information systems. Data bases</topic><topic>Joints</topic><topic>Learning</topic><topic>Mathematical models</topic><topic>Memory organisation. Data processing</topic><topic>Pattern analysis</topic><topic>sequential data</topic><topic>Software</topic><topic>Statistical analysis</topic><topic>statistical significance</topic><topic>Statistics</topic><topic>Stochastic processes</topic><topic>Streams</topic><topic>Studies</topic><topic>Time series analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Srivatsan Laxman</creatorcontrib><creatorcontrib>Sastry, P.S.</creatorcontrib><creatorcontrib>Unnikrishnan, K.P.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Engineering Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Srivatsan Laxman</au><au>Sastry, P.S.</au><au>Unnikrishnan, K.P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Discovering frequent episodes and learning hidden Markov models: a formal connection</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2005-11-01</date><risdate>2005</risdate><volume>17</volume><issue>11</issue><spage>1505</spage><epage>1517</epage><pages>1505-1517</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.</abstract><cop>New York, NY</cop><pub>IEEE</pub><doi>10.1109/TKDE.2005.181</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1041-4347
ispartof	IEEE transactions on knowledge and data engineering, 2005-11, Vol.17 (11), p.1505-1517
issn	1041-4347 1558-2191
language	eng
recordid	cdi_crossref_primary_10_1109_TKDE_2005_181
source	IEEE Electronic Library (IEL)
subjects	Algorithms Application software Applied sciences Computer science Computer science control theory systems Computer simulation Counting Data analysis Data mining Exact sciences and technology Frequency measurement frequent episodes Hidden Markov models Index Terms- Temporal data mining Information systems. Data bases Joints Learning Mathematical models Memory organisation. Data processing Pattern analysis sequential data Software Statistical analysis statistical significance Statistics Stochastic processes Streams Studies Time series analysis
title	Discovering frequent episodes and learning hidden Markov models: a formal connection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T19%3A47%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Discovering%20frequent%20episodes%20and%20learning%20hidden%20Markov%20models:%20a%20formal%20connection&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Srivatsan%20Laxman&rft.date=2005-11-01&rft.volume=17&rft.issue=11&rft.spage=1505&rft.epage=1517&rft.pages=1505-1517&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2005.181&rft_dat=%3Cproquest_RIE%3E896204825%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=912074772&rft_id=info:pmid/&rft_ieee_id=1512036&rfr_iscdi=true