Fast Utility Mining on Sequence Data

High-utility sequential pattern (HUSP) mining is an emerging topic in the field of knowledge discovery in databases. It consists of discovering subsequences that have a high utility (importance) in sequences, which can be referred to as HUSPs. HUSPs can be applied to many real-life applications, suc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on cybernetics 2021-02, Vol.51 (2), p.487-500
Hauptverfasser: Gan, Wensheng, Lin, Jerry Chun-Wei, Zhang, Jiexiong, Fournier-Viger, Philippe, Chao, Han-Chieh, Yu, Philip S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 500
container_issue 2
container_start_page 487
container_title IEEE transactions on cybernetics
container_volume 51
creator Gan, Wensheng
Lin, Jerry Chun-Wei
Zhang, Jiexiong
Fournier-Viger, Philippe
Chao, Han-Chieh
Yu, Philip S.
description High-utility sequential pattern (HUSP) mining is an emerging topic in the field of knowledge discovery in databases. It consists of discovering subsequences that have a high utility (importance) in sequences, which can be referred to as HUSPs. HUSPs can be applied to many real-life applications, such as market basket analysis, e-commerce recommendations, click-stream analysis, and route planning. Several algorithms have been proposed to efficiently mine utility-based useful sequential patterns. However, due to the combinatorial explosion of the search space for low utility threshold and large-scale data, the performances of these algorithms are unsatisfactory in terms of runtime and memory usage. Hence, this article proposes an efficient algorithm for the task of HUSP mining, called HUSP mining with UL-list (HUSP-ULL). It utilizes a lexicographic q -sequence (LQS)-tree and a utility-linked (UL)-list structure to quickly discover HUSPs. Furthermore, two pruning strategies are introduced in HUSP-ULL to obtain tight upper bounds on the utility of the candidate sequences and reduce the search space by pruning unpromising candidates early. Substantial experiments on both real-life and synthetic datasets showed that HUSP-ULL can effectively and efficiently discover the complete set of HUSPs and that it outperforms the state-of-the-art algorithms.
doi_str_mv 10.1109/TCYB.2020.2970176
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TCYB_2020_2970176</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9018003</ieee_id><sourcerecordid>2374316285</sourcerecordid><originalsourceid>FETCH-LOGICAL-c392t-9cbd621c48e12411865553014c8f840a682da34f150f4886f890c60ad6bb94f23</originalsourceid><addsrcrecordid>eNpdkD1PwzAQhi0EolXpD0BIKBIMLC2-80fsEQoFpCIG2oHJchwHpUqTEidD_z2JWjrg5Szfc-9ZDyGXQKcAVN8vZ1-PU6RIp6hjCrE8IUMEqSaIsTg93mU8IOMQ1rQ7qnvS6pwMGAJHLvmQ3M5taKJVkxd5s4ve8zIvv6OqjD79T-tL56Mn29gLcpbZIvjxoY7Iav68nL1OFh8vb7OHxcQxjc1EuySVCI4rD8gBlBRCMArcqUxxaqXC1DKegaAZV0pmSlMnqU1lkmieIRuRu33utq669aExmzw4XxS29FUbDLKYM5CoRIfe_EPXVVuX3e8M8lgpJmLRB8KecnUVQu0zs63zja13BqjpLZreouktmoPFbub6kNwmG58eJ_6cdcDVHsi998e2pqAoZewXsIJxYg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2478835752</pqid></control><display><type>article</type><title>Fast Utility Mining on Sequence Data</title><source>IEEE/IET Electronic Library (IEL)</source><creator>Gan, Wensheng ; Lin, Jerry Chun-Wei ; Zhang, Jiexiong ; Fournier-Viger, Philippe ; Chao, Han-Chieh ; Yu, Philip S.</creator><creatorcontrib>Gan, Wensheng ; Lin, Jerry Chun-Wei ; Zhang, Jiexiong ; Fournier-Viger, Philippe ; Chao, Han-Chieh ; Yu, Philip S.</creatorcontrib><description>High-utility sequential pattern (HUSP) mining is an emerging topic in the field of knowledge discovery in databases. It consists of discovering subsequences that have a high utility (importance) in sequences, which can be referred to as HUSPs. HUSPs can be applied to many real-life applications, such as market basket analysis, e-commerce recommendations, click-stream analysis, and route planning. Several algorithms have been proposed to efficiently mine utility-based useful sequential patterns. However, due to the combinatorial explosion of the search space for low utility threshold and large-scale data, the performances of these algorithms are unsatisfactory in terms of runtime and memory usage. Hence, this article proposes an efficient algorithm for the task of HUSP mining, called HUSP mining with UL-list (HUSP-ULL). It utilizes a lexicographic &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;q &lt;/tex-math&gt;&lt;/inline-formula&gt;-sequence (LQS)-tree and a utility-linked (UL)-list structure to quickly discover HUSPs. Furthermore, two pruning strategies are introduced in HUSP-ULL to obtain tight upper bounds on the utility of the candidate sequences and reduce the search space by pruning unpromising candidates early. Substantial experiments on both real-life and synthetic datasets showed that HUSP-ULL can effectively and efficiently discover the complete set of HUSPs and that it outperforms the state-of-the-art algorithms.</description><identifier>ISSN: 2168-2267</identifier><identifier>EISSN: 2168-2275</identifier><identifier>DOI: 10.1109/TCYB.2020.2970176</identifier><identifier>PMID: 32142464</identifier><identifier>CODEN: ITCEB8</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithms ; Combinatorial analysis ; Computer science ; Cybernetics ; Data mining ; Economic behavior ; Gallium nitride ; Itemsets ; linked-list structure ; Mining ; Route planning ; sequence ; Upper bound ; Upper bounds ; utility mining ; utility theory</subject><ispartof>IEEE transactions on cybernetics, 2021-02, Vol.51 (2), p.487-500</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c392t-9cbd621c48e12411865553014c8f840a682da34f150f4886f890c60ad6bb94f23</citedby><cites>FETCH-LOGICAL-c392t-9cbd621c48e12411865553014c8f840a682da34f150f4886f890c60ad6bb94f23</cites><orcidid>0000-0002-5781-8116 ; 0000-0001-8768-9709 ; 0000-0002-7680-9899</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9018003$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27926,27927,54760</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9018003$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32142464$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Gan, Wensheng</creatorcontrib><creatorcontrib>Lin, Jerry Chun-Wei</creatorcontrib><creatorcontrib>Zhang, Jiexiong</creatorcontrib><creatorcontrib>Fournier-Viger, Philippe</creatorcontrib><creatorcontrib>Chao, Han-Chieh</creatorcontrib><creatorcontrib>Yu, Philip S.</creatorcontrib><title>Fast Utility Mining on Sequence Data</title><title>IEEE transactions on cybernetics</title><addtitle>TCYB</addtitle><addtitle>IEEE Trans Cybern</addtitle><description>High-utility sequential pattern (HUSP) mining is an emerging topic in the field of knowledge discovery in databases. It consists of discovering subsequences that have a high utility (importance) in sequences, which can be referred to as HUSPs. HUSPs can be applied to many real-life applications, such as market basket analysis, e-commerce recommendations, click-stream analysis, and route planning. Several algorithms have been proposed to efficiently mine utility-based useful sequential patterns. However, due to the combinatorial explosion of the search space for low utility threshold and large-scale data, the performances of these algorithms are unsatisfactory in terms of runtime and memory usage. Hence, this article proposes an efficient algorithm for the task of HUSP mining, called HUSP mining with UL-list (HUSP-ULL). It utilizes a lexicographic &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;q &lt;/tex-math&gt;&lt;/inline-formula&gt;-sequence (LQS)-tree and a utility-linked (UL)-list structure to quickly discover HUSPs. Furthermore, two pruning strategies are introduced in HUSP-ULL to obtain tight upper bounds on the utility of the candidate sequences and reduce the search space by pruning unpromising candidates early. Substantial experiments on both real-life and synthetic datasets showed that HUSP-ULL can effectively and efficiently discover the complete set of HUSPs and that it outperforms the state-of-the-art algorithms.</description><subject>Algorithms</subject><subject>Combinatorial analysis</subject><subject>Computer science</subject><subject>Cybernetics</subject><subject>Data mining</subject><subject>Economic behavior</subject><subject>Gallium nitride</subject><subject>Itemsets</subject><subject>linked-list structure</subject><subject>Mining</subject><subject>Route planning</subject><subject>sequence</subject><subject>Upper bound</subject><subject>Upper bounds</subject><subject>utility mining</subject><subject>utility theory</subject><issn>2168-2267</issn><issn>2168-2275</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkD1PwzAQhi0EolXpD0BIKBIMLC2-80fsEQoFpCIG2oHJchwHpUqTEidD_z2JWjrg5Szfc-9ZDyGXQKcAVN8vZ1-PU6RIp6hjCrE8IUMEqSaIsTg93mU8IOMQ1rQ7qnvS6pwMGAJHLvmQ3M5taKJVkxd5s4ve8zIvv6OqjD79T-tL56Mn29gLcpbZIvjxoY7Iav68nL1OFh8vb7OHxcQxjc1EuySVCI4rD8gBlBRCMArcqUxxaqXC1DKegaAZV0pmSlMnqU1lkmieIRuRu33utq669aExmzw4XxS29FUbDLKYM5CoRIfe_EPXVVuX3e8M8lgpJmLRB8KecnUVQu0zs63zja13BqjpLZreouktmoPFbub6kNwmG58eJ_6cdcDVHsi998e2pqAoZewXsIJxYg</recordid><startdate>20210201</startdate><enddate>20210201</enddate><creator>Gan, Wensheng</creator><creator>Lin, Jerry Chun-Wei</creator><creator>Zhang, Jiexiong</creator><creator>Fournier-Viger, Philippe</creator><creator>Chao, Han-Chieh</creator><creator>Yu, Philip S.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-5781-8116</orcidid><orcidid>https://orcid.org/0000-0001-8768-9709</orcidid><orcidid>https://orcid.org/0000-0002-7680-9899</orcidid></search><sort><creationdate>20210201</creationdate><title>Fast Utility Mining on Sequence Data</title><author>Gan, Wensheng ; Lin, Jerry Chun-Wei ; Zhang, Jiexiong ; Fournier-Viger, Philippe ; Chao, Han-Chieh ; Yu, Philip S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c392t-9cbd621c48e12411865553014c8f840a682da34f150f4886f890c60ad6bb94f23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Combinatorial analysis</topic><topic>Computer science</topic><topic>Cybernetics</topic><topic>Data mining</topic><topic>Economic behavior</topic><topic>Gallium nitride</topic><topic>Itemsets</topic><topic>linked-list structure</topic><topic>Mining</topic><topic>Route planning</topic><topic>sequence</topic><topic>Upper bound</topic><topic>Upper bounds</topic><topic>utility mining</topic><topic>utility theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gan, Wensheng</creatorcontrib><creatorcontrib>Lin, Jerry Chun-Wei</creatorcontrib><creatorcontrib>Zhang, Jiexiong</creatorcontrib><creatorcontrib>Fournier-Viger, Philippe</creatorcontrib><creatorcontrib>Chao, Han-Chieh</creatorcontrib><creatorcontrib>Yu, Philip S.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gan, Wensheng</au><au>Lin, Jerry Chun-Wei</au><au>Zhang, Jiexiong</au><au>Fournier-Viger, Philippe</au><au>Chao, Han-Chieh</au><au>Yu, Philip S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fast Utility Mining on Sequence Data</atitle><jtitle>IEEE transactions on cybernetics</jtitle><stitle>TCYB</stitle><addtitle>IEEE Trans Cybern</addtitle><date>2021-02-01</date><risdate>2021</risdate><volume>51</volume><issue>2</issue><spage>487</spage><epage>500</epage><pages>487-500</pages><issn>2168-2267</issn><eissn>2168-2275</eissn><coden>ITCEB8</coden><abstract>High-utility sequential pattern (HUSP) mining is an emerging topic in the field of knowledge discovery in databases. It consists of discovering subsequences that have a high utility (importance) in sequences, which can be referred to as HUSPs. HUSPs can be applied to many real-life applications, such as market basket analysis, e-commerce recommendations, click-stream analysis, and route planning. Several algorithms have been proposed to efficiently mine utility-based useful sequential patterns. However, due to the combinatorial explosion of the search space for low utility threshold and large-scale data, the performances of these algorithms are unsatisfactory in terms of runtime and memory usage. Hence, this article proposes an efficient algorithm for the task of HUSP mining, called HUSP mining with UL-list (HUSP-ULL). It utilizes a lexicographic &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;q &lt;/tex-math&gt;&lt;/inline-formula&gt;-sequence (LQS)-tree and a utility-linked (UL)-list structure to quickly discover HUSPs. Furthermore, two pruning strategies are introduced in HUSP-ULL to obtain tight upper bounds on the utility of the candidate sequences and reduce the search space by pruning unpromising candidates early. Substantial experiments on both real-life and synthetic datasets showed that HUSP-ULL can effectively and efficiently discover the complete set of HUSPs and that it outperforms the state-of-the-art algorithms.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>32142464</pmid><doi>10.1109/TCYB.2020.2970176</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-5781-8116</orcidid><orcidid>https://orcid.org/0000-0001-8768-9709</orcidid><orcidid>https://orcid.org/0000-0002-7680-9899</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2168-2267
ispartof IEEE transactions on cybernetics, 2021-02, Vol.51 (2), p.487-500
issn 2168-2267
2168-2275
language eng
recordid cdi_crossref_primary_10_1109_TCYB_2020_2970176
source IEEE/IET Electronic Library (IEL)
subjects Algorithms
Combinatorial analysis
Computer science
Cybernetics
Data mining
Economic behavior
Gallium nitride
Itemsets
linked-list structure
Mining
Route planning
sequence
Upper bound
Upper bounds
utility mining
utility theory
title Fast Utility Mining on Sequence Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T17%3A24%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fast%20Utility%20Mining%20on%20Sequence%20Data&rft.jtitle=IEEE%20transactions%20on%20cybernetics&rft.au=Gan,%20Wensheng&rft.date=2021-02-01&rft.volume=51&rft.issue=2&rft.spage=487&rft.epage=500&rft.pages=487-500&rft.issn=2168-2267&rft.eissn=2168-2275&rft.coden=ITCEB8&rft_id=info:doi/10.1109/TCYB.2020.2970176&rft_dat=%3Cproquest_RIE%3E2374316285%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2478835752&rft_id=info:pmid/32142464&rft_ieee_id=9018003&rfr_iscdi=true