A scalable algorithm for mining maximal frequent sequences using sampling

We propose an efficient scalable algorithm for mining Maximal Sequential Patterns using Sampling (MSPS). The MSPS algorithm reduces much more search space than other algorithms because both the subsequence infrequency based pruning and the supersequence frequency based pruning are applied. In MSPS,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Luo, C., Chung, S.M.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 165
container_issue
container_start_page 156
container_title
container_volume
creator Luo, C.
Chung, S.M.
description We propose an efficient scalable algorithm for mining Maximal Sequential Patterns using Sampling (MSPS). The MSPS algorithm reduces much more search space than other algorithms because both the subsequence infrequency based pruning and the supersequence frequency based pruning are applied. In MSPS, sampling technique is used to identify long frequent sequences earlier, instead of enumerating all their subsequences. We propose how to adjust the user-specified minimum support level for mining a sample of the database to achieve better performance. This method makes sampling more efficient when the minimum support is small. A signature technique is utilized for the subsequence infrequency based pruning when the seed set of frequent sequences for the candidate generation is too big to be loaded into memory. A prefix tree structure is developed to count the candidate sequences of different sizes during the database scanning, and it also facilitates the customer sequence trimming. Our experiments showed MSPS has very good performance and better scalability than other algorithms.
doi_str_mv 10.1109/ICTAI.2004.16
format Conference Proceeding
fullrecord <record><control><sourceid>pascalfrancis_6IE</sourceid><recordid>TN_cdi_ieee_primary_1374182</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1374182</ieee_id><sourcerecordid>19103830</sourcerecordid><originalsourceid>FETCH-LOGICAL-i205t-fd69b11c33fb651b938ee86584fcacb3214de6e907455aea1d9695d9ad3e1e8c3</originalsourceid><addsrcrecordid>eNpFj0tLA0EQhAcfYIg5evIyF48bu-c9xxB8LAS8xHOYne2NI7ubuJOA_ns3KliXKuiPpoqxG4Q5Ivj7crlelHMBoOZozthESKsLQG_P2cxbB9Z4LYQ06oJNEJwopAJ_xWY5v8MoDRa8mbBywXMMbaha4qHd7oZ0eOt4sxt4l_rUb3kXPlMXWt4M9HGk_sDzj0fK_JhPQA7dvh3DNbtsQptp9udT9vr4sF4-F6uXp3K5WBVJgD4UTW18hRilbCqjsfLSETmjnWpiiJUUqGoy5MEqrQMFrP24pPahloTkopyyu9-_-3Bq3gyhjylv9sNYc_jaoEeQTsLI3f5yiYj-z9IqdEJ-A8gtXDY</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A scalable algorithm for mining maximal frequent sequences using sampling</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Luo, C. ; Chung, S.M.</creator><creatorcontrib>Luo, C. ; Chung, S.M.</creatorcontrib><description>We propose an efficient scalable algorithm for mining Maximal Sequential Patterns using Sampling (MSPS). The MSPS algorithm reduces much more search space than other algorithms because both the subsequence infrequency based pruning and the supersequence frequency based pruning are applied. In MSPS, sampling technique is used to identify long frequent sequences earlier, instead of enumerating all their subsequences. We propose how to adjust the user-specified minimum support level for mining a sample of the database to achieve better performance. This method makes sampling more efficient when the minimum support is small. A signature technique is utilized for the subsequence infrequency based pruning when the seed set of frequent sequences for the candidate generation is too big to be loaded into memory. A prefix tree structure is developed to count the candidate sequences of different sizes during the database scanning, and it also facilitates the customer sequence trimming. Our experiments showed MSPS has very good performance and better scalability than other algorithms.</description><identifier>ISSN: 1082-3409</identifier><identifier>ISBN: 9780769522364</identifier><identifier>ISBN: 076952236X</identifier><identifier>EISSN: 2375-0197</identifier><identifier>DOI: 10.1109/ICTAI.2004.16</identifier><language>eng</language><publisher>Los Alamitos CA: IEEE</publisher><subject>Applied sciences ; Artificial intelligence ; Association rules ; Computer science ; Computer science; control theory; systems ; Costs ; Data analysis ; Data mining ; Exact sciences and technology ; Frequency ; Sampling methods ; Scalability ; Tree data structures</subject><ispartof>16th IEEE International Conference on Tools with Artificial Intelligence, 2004, p.156-165</ispartof><rights>2007 INIST-CNRS</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1374182$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,4050,4051,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1374182$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=19103830$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Luo, C.</creatorcontrib><creatorcontrib>Chung, S.M.</creatorcontrib><title>A scalable algorithm for mining maximal frequent sequences using sampling</title><title>16th IEEE International Conference on Tools with Artificial Intelligence</title><addtitle>TAI</addtitle><description>We propose an efficient scalable algorithm for mining Maximal Sequential Patterns using Sampling (MSPS). The MSPS algorithm reduces much more search space than other algorithms because both the subsequence infrequency based pruning and the supersequence frequency based pruning are applied. In MSPS, sampling technique is used to identify long frequent sequences earlier, instead of enumerating all their subsequences. We propose how to adjust the user-specified minimum support level for mining a sample of the database to achieve better performance. This method makes sampling more efficient when the minimum support is small. A signature technique is utilized for the subsequence infrequency based pruning when the seed set of frequent sequences for the candidate generation is too big to be loaded into memory. A prefix tree structure is developed to count the candidate sequences of different sizes during the database scanning, and it also facilitates the customer sequence trimming. Our experiments showed MSPS has very good performance and better scalability than other algorithms.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Association rules</subject><subject>Computer science</subject><subject>Computer science; control theory; systems</subject><subject>Costs</subject><subject>Data analysis</subject><subject>Data mining</subject><subject>Exact sciences and technology</subject><subject>Frequency</subject><subject>Sampling methods</subject><subject>Scalability</subject><subject>Tree data structures</subject><issn>1082-3409</issn><issn>2375-0197</issn><isbn>9780769522364</isbn><isbn>076952236X</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2004</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpFj0tLA0EQhAcfYIg5evIyF48bu-c9xxB8LAS8xHOYne2NI7ubuJOA_ns3KliXKuiPpoqxG4Q5Ivj7crlelHMBoOZozthESKsLQG_P2cxbB9Z4LYQ06oJNEJwopAJ_xWY5v8MoDRa8mbBywXMMbaha4qHd7oZ0eOt4sxt4l_rUb3kXPlMXWt4M9HGk_sDzj0fK_JhPQA7dvh3DNbtsQptp9udT9vr4sF4-F6uXp3K5WBVJgD4UTW18hRilbCqjsfLSETmjnWpiiJUUqGoy5MEqrQMFrP24pPahloTkopyyu9-_-3Bq3gyhjylv9sNYc_jaoEeQTsLI3f5yiYj-z9IqdEJ-A8gtXDY</recordid><startdate>2004</startdate><enddate>2004</enddate><creator>Luo, C.</creator><creator>Chung, S.M.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope><scope>IQODW</scope></search><sort><creationdate>2004</creationdate><title>A scalable algorithm for mining maximal frequent sequences using sampling</title><author>Luo, C. ; Chung, S.M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i205t-fd69b11c33fb651b938ee86584fcacb3214de6e907455aea1d9695d9ad3e1e8c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Association rules</topic><topic>Computer science</topic><topic>Computer science; control theory; systems</topic><topic>Costs</topic><topic>Data analysis</topic><topic>Data mining</topic><topic>Exact sciences and technology</topic><topic>Frequency</topic><topic>Sampling methods</topic><topic>Scalability</topic><topic>Tree data structures</topic><toplevel>online_resources</toplevel><creatorcontrib>Luo, C.</creatorcontrib><creatorcontrib>Chung, S.M.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Luo, C.</au><au>Chung, S.M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A scalable algorithm for mining maximal frequent sequences using sampling</atitle><btitle>16th IEEE International Conference on Tools with Artificial Intelligence</btitle><stitle>TAI</stitle><date>2004</date><risdate>2004</risdate><spage>156</spage><epage>165</epage><pages>156-165</pages><issn>1082-3409</issn><eissn>2375-0197</eissn><isbn>9780769522364</isbn><isbn>076952236X</isbn><abstract>We propose an efficient scalable algorithm for mining Maximal Sequential Patterns using Sampling (MSPS). The MSPS algorithm reduces much more search space than other algorithms because both the subsequence infrequency based pruning and the supersequence frequency based pruning are applied. In MSPS, sampling technique is used to identify long frequent sequences earlier, instead of enumerating all their subsequences. We propose how to adjust the user-specified minimum support level for mining a sample of the database to achieve better performance. This method makes sampling more efficient when the minimum support is small. A signature technique is utilized for the subsequence infrequency based pruning when the seed set of frequent sequences for the candidate generation is too big to be loaded into memory. A prefix tree structure is developed to count the candidate sequences of different sizes during the database scanning, and it also facilitates the customer sequence trimming. Our experiments showed MSPS has very good performance and better scalability than other algorithms.</abstract><cop>Los Alamitos CA</cop><pub>IEEE</pub><doi>10.1109/ICTAI.2004.16</doi><tpages>10</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1082-3409
ispartof 16th IEEE International Conference on Tools with Artificial Intelligence, 2004, p.156-165
issn 1082-3409
2375-0197
language eng
recordid cdi_ieee_primary_1374182
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Applied sciences
Artificial intelligence
Association rules
Computer science
Computer science
control theory
systems
Costs
Data analysis
Data mining
Exact sciences and technology
Frequency
Sampling methods
Scalability
Tree data structures
title A scalable algorithm for mining maximal frequent sequences using sampling
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T15%3A17%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20scalable%20algorithm%20for%20mining%20maximal%20frequent%20sequences%20using%20sampling&rft.btitle=16th%20IEEE%20International%20Conference%20on%20Tools%20with%20Artificial%20Intelligence&rft.au=Luo,%20C.&rft.date=2004&rft.spage=156&rft.epage=165&rft.pages=156-165&rft.issn=1082-3409&rft.eissn=2375-0197&rft.isbn=9780769522364&rft.isbn_list=076952236X&rft_id=info:doi/10.1109/ICTAI.2004.16&rft_dat=%3Cpascalfrancis_6IE%3E19103830%3C/pascalfrancis_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=1374182&rfr_iscdi=true