A new algorithm for fast mining frequent itemsets using N-lists
Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that st...
Gespeichert in:
Veröffentlicht in: | Science China. Information sciences 2012-09, Vol.55 (9), p.2008-2030 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2030 |
---|---|
container_issue | 9 |
container_start_page | 2008 |
container_title | Science China. Information sciences |
container_volume | 55 |
creator | Deng, ZhiHong Wang, ZhongHui Jiang, JiaJian |
description | Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets' supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m +n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though tile algorithm consumes more memory when the datasets are sparse, it is still the fastest one. |
doi_str_mv | 10.1007/s11432-012-4638-z |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671367547</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cqvip_id>43037988</cqvip_id><sourcerecordid>2918650461</sourcerecordid><originalsourceid>FETCH-LOGICAL-c375t-a7b823ca3d01cb471fb451b35119760585f33abb0c71a2578b85f23424e3a2d03</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhosouKz7A7xVvHiJZjJJk55kWfwC0YuCt5B2026XfuwmLeL-erN0UfBgLgnD885Mnig6B3oNlMobD8CREQqM8AQV2R1FE1BJSiCF9Di8E8mJRPw4jWber2k4iJRJNYlu53FrP2NTl52r-lUTF52LC-P7uKnaqi3jwtntYNs-rnrbeNv7ePD7-gupK9_7s-ikMLW3s8M9jd7v794Wj-T59eFpMX8mOUrREyMzxTA3uKSQZ1xCkXEBGQqAVCZUKFEgmiyjuQTDhFRZqDDkjFs0bElxGl2NfTeuC_v4XjeVz21dm9Z2g9fhi4CJFFwG9PIPuu4G14btNEuDFkF5AoGCkcpd572zhd64qjHuSwPVe6t6tKqDVb23qnchw8aMD2xbWvfb-b_QxWHQqmvLbcj9TOJIUaZK4Tdb2IOV</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918650461</pqid></control><display><type>article</type><title>A new algorithm for fast mining frequent itemsets using N-lists</title><source>SpringerLink Journals</source><source>Alma/SFX Local Collection</source><source>ProQuest Central</source><creator>Deng, ZhiHong ; Wang, ZhongHui ; Jiang, JiaJian</creator><creatorcontrib>Deng, ZhiHong ; Wang, ZhongHui ; Jiang, JiaJian</creatorcontrib><description>Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets' supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m +n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though tile algorithm consumes more memory when the datasets are sparse, it is still the fastest one.</description><identifier>ISSN: 1674-733X</identifier><identifier>EISSN: 1869-1919</identifier><identifier>DOI: 10.1007/s11432-012-4638-z</identifier><language>eng</language><publisher>Heidelberg: SP Science China Press</publisher><subject>Algorithms ; Computer Science ; Counting ; Data mining ; Data structures ; Datasets ; FP-树 ; Information Systems and Communication Service ; Mining ; Representations ; Research Paper ; Stores ; Strategy ; Synthetic data ; Tasks ; 候选项目集 ; 实验评价 ; 挖掘算法 ; 数据挖掘 ; 数据结构 ; 数据表示 ; 频繁项集</subject><ispartof>Science China. Information sciences, 2012-09, Vol.55 (9), p.2008-2030</ispartof><rights>Science China Press and Springer-Verlag Berlin Heidelberg 2012</rights><rights>Science China Press and Springer-Verlag Berlin Heidelberg 2012.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c375t-a7b823ca3d01cb471fb451b35119760585f33abb0c71a2578b85f23424e3a2d03</citedby><cites>FETCH-LOGICAL-c375t-a7b823ca3d01cb471fb451b35119760585f33abb0c71a2578b85f23424e3a2d03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://image.cqvip.com/vip1000/qk/84009A/84009A.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11432-012-4638-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918650461?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21367,27901,27902,33721,33722,41464,42533,43781,51294</link.rule.ids></links><search><creatorcontrib>Deng, ZhiHong</creatorcontrib><creatorcontrib>Wang, ZhongHui</creatorcontrib><creatorcontrib>Jiang, JiaJian</creatorcontrib><title>A new algorithm for fast mining frequent itemsets using N-lists</title><title>Science China. Information sciences</title><addtitle>Sci. China Inf. Sci</addtitle><addtitle>SCIENCE CHINA Information Sciences</addtitle><description>Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets' supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m +n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though tile algorithm consumes more memory when the datasets are sparse, it is still the fastest one.</description><subject>Algorithms</subject><subject>Computer Science</subject><subject>Counting</subject><subject>Data mining</subject><subject>Data structures</subject><subject>Datasets</subject><subject>FP-树</subject><subject>Information Systems and Communication Service</subject><subject>Mining</subject><subject>Representations</subject><subject>Research Paper</subject><subject>Stores</subject><subject>Strategy</subject><subject>Synthetic data</subject><subject>Tasks</subject><subject>候选项目集</subject><subject>实验评价</subject><subject>挖掘算法</subject><subject>数据挖掘</subject><subject>数据结构</subject><subject>数据表示</subject><subject>频繁项集</subject><issn>1674-733X</issn><issn>1869-1919</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kE1LxDAQhosouKz7A7xVvHiJZjJJk55kWfwC0YuCt5B2026XfuwmLeL-erN0UfBgLgnD885Mnig6B3oNlMobD8CREQqM8AQV2R1FE1BJSiCF9Di8E8mJRPw4jWber2k4iJRJNYlu53FrP2NTl52r-lUTF52LC-P7uKnaqi3jwtntYNs-rnrbeNv7ePD7-gupK9_7s-ikMLW3s8M9jd7v794Wj-T59eFpMX8mOUrREyMzxTA3uKSQZ1xCkXEBGQqAVCZUKFEgmiyjuQTDhFRZqDDkjFs0bElxGl2NfTeuC_v4XjeVz21dm9Z2g9fhi4CJFFwG9PIPuu4G14btNEuDFkF5AoGCkcpd572zhd64qjHuSwPVe6t6tKqDVb23qnchw8aMD2xbWvfb-b_QxWHQqmvLbcj9TOJIUaZK4Tdb2IOV</recordid><startdate>20120901</startdate><enddate>20120901</enddate><creator>Deng, ZhiHong</creator><creator>Wang, ZhongHui</creator><creator>Jiang, JiaJian</creator><general>SP Science China Press</general><general>Springer Nature B.V</general><scope>2RA</scope><scope>92L</scope><scope>CQIGP</scope><scope>W92</scope><scope>~WA</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>7SC</scope><scope>8FD</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20120901</creationdate><title>A new algorithm for fast mining frequent itemsets using N-lists</title><author>Deng, ZhiHong ; Wang, ZhongHui ; Jiang, JiaJian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c375t-a7b823ca3d01cb471fb451b35119760585f33abb0c71a2578b85f23424e3a2d03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Computer Science</topic><topic>Counting</topic><topic>Data mining</topic><topic>Data structures</topic><topic>Datasets</topic><topic>FP-树</topic><topic>Information Systems and Communication Service</topic><topic>Mining</topic><topic>Representations</topic><topic>Research Paper</topic><topic>Stores</topic><topic>Strategy</topic><topic>Synthetic data</topic><topic>Tasks</topic><topic>候选项目集</topic><topic>实验评价</topic><topic>挖掘算法</topic><topic>数据挖掘</topic><topic>数据结构</topic><topic>数据表示</topic><topic>频繁项集</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Deng, ZhiHong</creatorcontrib><creatorcontrib>Wang, ZhongHui</creatorcontrib><creatorcontrib>Jiang, JiaJian</creatorcontrib><collection>中文科技期刊数据库</collection><collection>中文科技期刊数据库-CALIS站点</collection><collection>中文科技期刊数据库-7.0平台</collection><collection>中文科技期刊数据库-工程技术</collection><collection>中文科技期刊数据库- 镜像站点</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Science China. Information sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Deng, ZhiHong</au><au>Wang, ZhongHui</au><au>Jiang, JiaJian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A new algorithm for fast mining frequent itemsets using N-lists</atitle><jtitle>Science China. Information sciences</jtitle><stitle>Sci. China Inf. Sci</stitle><addtitle>SCIENCE CHINA Information Sciences</addtitle><date>2012-09-01</date><risdate>2012</risdate><volume>55</volume><issue>9</issue><spage>2008</spage><epage>2030</epage><pages>2008-2030</pages><issn>1674-733X</issn><eissn>1869-1919</eissn><abstract>Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets' supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m +n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though tile algorithm consumes more memory when the datasets are sparse, it is still the fastest one.</abstract><cop>Heidelberg</cop><pub>SP Science China Press</pub><doi>10.1007/s11432-012-4638-z</doi><tpages>23</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1674-733X |
ispartof | Science China. Information sciences, 2012-09, Vol.55 (9), p.2008-2030 |
issn | 1674-733X 1869-1919 |
language | eng |
recordid | cdi_proquest_miscellaneous_1671367547 |
source | SpringerLink Journals; Alma/SFX Local Collection; ProQuest Central |
subjects | Algorithms Computer Science Counting Data mining Data structures Datasets FP-树 Information Systems and Communication Service Mining Representations Research Paper Stores Strategy Synthetic data Tasks 候选项目集 实验评价 挖掘算法 数据挖掘 数据结构 数据表示 频繁项集 |
title | A new algorithm for fast mining frequent itemsets using N-lists |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T02%3A45%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20new%20algorithm%20for%20fast%20mining%20frequent%20itemsets%20using%20N-lists&rft.jtitle=Science%20China.%20Information%20sciences&rft.au=Deng,%20ZhiHong&rft.date=2012-09-01&rft.volume=55&rft.issue=9&rft.spage=2008&rft.epage=2030&rft.pages=2008-2030&rft.issn=1674-733X&rft.eissn=1869-1919&rft_id=info:doi/10.1007/s11432-012-4638-z&rft_dat=%3Cproquest_cross%3E2918650461%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918650461&rft_id=info:pmid/&rft_cqvip_id=43037988&rfr_iscdi=true |