A new algorithm for fast mining frequent itemsets using N-lists

Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that st...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Science China. Information sciences 2012-09, Vol.55 (9), p.2008-2030
Hauptverfasser: Deng, ZhiHong, Wang, ZhongHui, Jiang, JiaJian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2030
container_issue 9
container_start_page 2008
container_title Science China. Information sciences
container_volume 55
creator Deng, ZhiHong
Wang, ZhongHui
Jiang, JiaJian
description Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets' supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m +n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though tile algorithm consumes more memory when the datasets are sparse, it is still the fastest one.
doi_str_mv 10.1007/s11432-012-4638-z
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671367547</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cqvip_id>43037988</cqvip_id><sourcerecordid>2918650461</sourcerecordid><originalsourceid>FETCH-LOGICAL-c375t-a7b823ca3d01cb471fb451b35119760585f33abb0c71a2578b85f23424e3a2d03</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhosouKz7A7xVvHiJZjJJk55kWfwC0YuCt5B2026XfuwmLeL-erN0UfBgLgnD885Mnig6B3oNlMobD8CREQqM8AQV2R1FE1BJSiCF9Di8E8mJRPw4jWber2k4iJRJNYlu53FrP2NTl52r-lUTF52LC-P7uKnaqi3jwtntYNs-rnrbeNv7ePD7-gupK9_7s-ikMLW3s8M9jd7v794Wj-T59eFpMX8mOUrREyMzxTA3uKSQZ1xCkXEBGQqAVCZUKFEgmiyjuQTDhFRZqDDkjFs0bElxGl2NfTeuC_v4XjeVz21dm9Z2g9fhi4CJFFwG9PIPuu4G14btNEuDFkF5AoGCkcpd572zhd64qjHuSwPVe6t6tKqDVb23qnchw8aMD2xbWvfb-b_QxWHQqmvLbcj9TOJIUaZK4Tdb2IOV</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918650461</pqid></control><display><type>article</type><title>A new algorithm for fast mining frequent itemsets using N-lists</title><source>SpringerLink Journals</source><source>Alma/SFX Local Collection</source><source>ProQuest Central</source><creator>Deng, ZhiHong ; Wang, ZhongHui ; Jiang, JiaJian</creator><creatorcontrib>Deng, ZhiHong ; Wang, ZhongHui ; Jiang, JiaJian</creatorcontrib><description>Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets' supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m +n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though tile algorithm consumes more memory when the datasets are sparse, it is still the fastest one.</description><identifier>ISSN: 1674-733X</identifier><identifier>EISSN: 1869-1919</identifier><identifier>DOI: 10.1007/s11432-012-4638-z</identifier><language>eng</language><publisher>Heidelberg: SP Science China Press</publisher><subject>Algorithms ; Computer Science ; Counting ; Data mining ; Data structures ; Datasets ; FP-树 ; Information Systems and Communication Service ; Mining ; Representations ; Research Paper ; Stores ; Strategy ; Synthetic data ; Tasks ; 候选项目集 ; 实验评价 ; 挖掘算法 ; 数据挖掘 ; 数据结构 ; 数据表示 ; 频繁项集</subject><ispartof>Science China. Information sciences, 2012-09, Vol.55 (9), p.2008-2030</ispartof><rights>Science China Press and Springer-Verlag Berlin Heidelberg 2012</rights><rights>Science China Press and Springer-Verlag Berlin Heidelberg 2012.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c375t-a7b823ca3d01cb471fb451b35119760585f33abb0c71a2578b85f23424e3a2d03</citedby><cites>FETCH-LOGICAL-c375t-a7b823ca3d01cb471fb451b35119760585f33abb0c71a2578b85f23424e3a2d03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://image.cqvip.com/vip1000/qk/84009A/84009A.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11432-012-4638-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918650461?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21367,27901,27902,33721,33722,41464,42533,43781,51294</link.rule.ids></links><search><creatorcontrib>Deng, ZhiHong</creatorcontrib><creatorcontrib>Wang, ZhongHui</creatorcontrib><creatorcontrib>Jiang, JiaJian</creatorcontrib><title>A new algorithm for fast mining frequent itemsets using N-lists</title><title>Science China. Information sciences</title><addtitle>Sci. China Inf. Sci</addtitle><addtitle>SCIENCE CHINA Information Sciences</addtitle><description>Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets' supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m +n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though tile algorithm consumes more memory when the datasets are sparse, it is still the fastest one.</description><subject>Algorithms</subject><subject>Computer Science</subject><subject>Counting</subject><subject>Data mining</subject><subject>Data structures</subject><subject>Datasets</subject><subject>FP-树</subject><subject>Information Systems and Communication Service</subject><subject>Mining</subject><subject>Representations</subject><subject>Research Paper</subject><subject>Stores</subject><subject>Strategy</subject><subject>Synthetic data</subject><subject>Tasks</subject><subject>候选项目集</subject><subject>实验评价</subject><subject>挖掘算法</subject><subject>数据挖掘</subject><subject>数据结构</subject><subject>数据表示</subject><subject>频繁项集</subject><issn>1674-733X</issn><issn>1869-1919</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kE1LxDAQhosouKz7A7xVvHiJZjJJk55kWfwC0YuCt5B2026XfuwmLeL-erN0UfBgLgnD885Mnig6B3oNlMobD8CREQqM8AQV2R1FE1BJSiCF9Di8E8mJRPw4jWber2k4iJRJNYlu53FrP2NTl52r-lUTF52LC-P7uKnaqi3jwtntYNs-rnrbeNv7ePD7-gupK9_7s-ikMLW3s8M9jd7v794Wj-T59eFpMX8mOUrREyMzxTA3uKSQZ1xCkXEBGQqAVCZUKFEgmiyjuQTDhFRZqDDkjFs0bElxGl2NfTeuC_v4XjeVz21dm9Z2g9fhi4CJFFwG9PIPuu4G14btNEuDFkF5AoGCkcpd572zhd64qjHuSwPVe6t6tKqDVb23qnchw8aMD2xbWvfb-b_QxWHQqmvLbcj9TOJIUaZK4Tdb2IOV</recordid><startdate>20120901</startdate><enddate>20120901</enddate><creator>Deng, ZhiHong</creator><creator>Wang, ZhongHui</creator><creator>Jiang, JiaJian</creator><general>SP Science China Press</general><general>Springer Nature B.V</general><scope>2RA</scope><scope>92L</scope><scope>CQIGP</scope><scope>W92</scope><scope>~WA</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>7SC</scope><scope>8FD</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20120901</creationdate><title>A new algorithm for fast mining frequent itemsets using N-lists</title><author>Deng, ZhiHong ; Wang, ZhongHui ; Jiang, JiaJian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c375t-a7b823ca3d01cb471fb451b35119760585f33abb0c71a2578b85f23424e3a2d03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Computer Science</topic><topic>Counting</topic><topic>Data mining</topic><topic>Data structures</topic><topic>Datasets</topic><topic>FP-树</topic><topic>Information Systems and Communication Service</topic><topic>Mining</topic><topic>Representations</topic><topic>Research Paper</topic><topic>Stores</topic><topic>Strategy</topic><topic>Synthetic data</topic><topic>Tasks</topic><topic>候选项目集</topic><topic>实验评价</topic><topic>挖掘算法</topic><topic>数据挖掘</topic><topic>数据结构</topic><topic>数据表示</topic><topic>频繁项集</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Deng, ZhiHong</creatorcontrib><creatorcontrib>Wang, ZhongHui</creatorcontrib><creatorcontrib>Jiang, JiaJian</creatorcontrib><collection>中文科技期刊数据库</collection><collection>中文科技期刊数据库-CALIS站点</collection><collection>中文科技期刊数据库-7.0平台</collection><collection>中文科技期刊数据库-工程技术</collection><collection>中文科技期刊数据库- 镜像站点</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Science China. Information sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Deng, ZhiHong</au><au>Wang, ZhongHui</au><au>Jiang, JiaJian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A new algorithm for fast mining frequent itemsets using N-lists</atitle><jtitle>Science China. Information sciences</jtitle><stitle>Sci. China Inf. Sci</stitle><addtitle>SCIENCE CHINA Information Sciences</addtitle><date>2012-09-01</date><risdate>2012</risdate><volume>55</volume><issue>9</issue><spage>2008</spage><epage>2030</epage><pages>2008-2030</pages><issn>1674-733X</issn><eissn>1869-1919</eissn><abstract>Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets' supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m +n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though tile algorithm consumes more memory when the datasets are sparse, it is still the fastest one.</abstract><cop>Heidelberg</cop><pub>SP Science China Press</pub><doi>10.1007/s11432-012-4638-z</doi><tpages>23</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1674-733X
ispartof Science China. Information sciences, 2012-09, Vol.55 (9), p.2008-2030
issn 1674-733X
1869-1919
language eng
recordid cdi_proquest_miscellaneous_1671367547
source SpringerLink Journals; Alma/SFX Local Collection; ProQuest Central
subjects Algorithms
Computer Science
Counting
Data mining
Data structures
Datasets
FP-树
Information Systems and Communication Service
Mining
Representations
Research Paper
Stores
Strategy
Synthetic data
Tasks
候选项目集
实验评价
挖掘算法
数据挖掘
数据结构
数据表示
频繁项集
title A new algorithm for fast mining frequent itemsets using N-lists
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T02%3A45%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20new%20algorithm%20for%20fast%20mining%20frequent%20itemsets%20using%20N-lists&rft.jtitle=Science%20China.%20Information%20sciences&rft.au=Deng,%20ZhiHong&rft.date=2012-09-01&rft.volume=55&rft.issue=9&rft.spage=2008&rft.epage=2030&rft.pages=2008-2030&rft.issn=1674-733X&rft.eissn=1869-1919&rft_id=info:doi/10.1007/s11432-012-4638-z&rft_dat=%3Cproquest_cross%3E2918650461%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918650461&rft_id=info:pmid/&rft_cqvip_id=43037988&rfr_iscdi=true