A new algorithm for fast mining frequent itemsets using N-lists

Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that st...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Science China. Information sciences 2012-09, Vol.55 (9), p.2008-2030
Hauptverfasser:	Deng, ZhiHong, Wang, ZhongHui, Jiang, JiaJian
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer Science Counting Data mining Data structures Datasets FP-树 Information Systems and Communication Service Mining Representations Research Paper Stores Strategy Synthetic data Tasks 候选项目集实验评价挖掘算法数据挖掘数据结构数据表示频繁项集
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2030
container_issue	9
container_start_page	2008
container_title	Science China. Information sciences
container_volume	55
creator	Deng, ZhiHong Wang, ZhongHui Jiang, JiaJian
description	Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets＇ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O（m ＋n） by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though tile algorithm consumes more memory when the datasets are sparse, it is still the fastest one.
doi_str_mv	10.1007/s11432-012-4638-z
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671367547</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cqvip_id>43037988</cqvip_id><sourcerecordid>2918650461</sourcerecordid><originalsourceid>FETCH-LOGICAL-c375t-a7b823ca3d01cb471fb451b35119760585f33abb0c71a2578b85f23424e3a2d03</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhosouKz7A7xVvHiJZjJJk55kWfwC0YuCt5B2026XfuwmLeL-erN0UfBgLgnD885Mnig6B3oNlMobD8CREQqM8AQV2R1FE1BJSiCF9Di8E8mJRPw4jWber2k4iJRJNYlu53FrP2NTl52r-lUTF52LC-P7uKnaqi3jwtntYNs-rnrbeNv7ePD7-gupK9_7s-ikMLW3s8M9jd7v794Wj-T59eFpMX8mOUrREyMzxTA3uKSQZ1xCkXEBGQqAVCZUKFEgmiyjuQTDhFRZqDDkjFs0bElxGl2NfTeuC_v4XjeVz21dm9Z2g9fhi4CJFFwG9PIPuu4G14btNEuDFkF5AoGCkcpd572zhd64qjHuSwPVe6t6tKqDVb23qnchw8aMD2xbWvfb-b_QxWHQqmvLbcj9TOJIUaZK4Tdb2IOV</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918650461</pqid></control><display><type>article</type><title>A new algorithm for fast mining frequent itemsets using N-lists</title><source>SpringerLink Journals</source><source>Alma/SFX Local Collection</source><source>ProQuest Central</source><creator>Deng, ZhiHong ; Wang, ZhongHui ; Jiang, JiaJian</creator><creatorcontrib>Deng, ZhiHong ; Wang, ZhongHui ; Jiang, JiaJian</creatorcontrib><description>Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets＇ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O（m ＋n） by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though tile algorithm consumes more memory when the datasets are sparse, it is still the fastest one.</description><identifier>ISSN: 1674-733X</identifier><identifier>EISSN: 1869-1919</identifier><identifier>DOI: 10.1007/s11432-012-4638-z</identifier><language>eng</language><publisher>Heidelberg: SP Science China Press</publisher><subject>Algorithms ; Computer Science ; Counting ; Data mining ; Data structures ; Datasets ; FP-树 ; Information Systems and Communication Service ; Mining ; Representations ; Research Paper ; Stores ; Strategy ; Synthetic data ; Tasks ; 候选项目集 ; 实验评价 ; 挖掘算法 ; 数据挖掘 ; 数据结构 ; 数据表示 ; 频繁项集</subject><ispartof>Science China. Information sciences, 2012-09, Vol.55 (9), p.2008-2030</ispartof><rights>Science China Press and Springer-Verlag Berlin Heidelberg 2012</rights><rights>Science China Press and Springer-Verlag Berlin Heidelberg 2012.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c375t-a7b823ca3d01cb471fb451b35119760585f33abb0c71a2578b85f23424e3a2d03</citedby><cites>FETCH-LOGICAL-c375t-a7b823ca3d01cb471fb451b35119760585f33abb0c71a2578b85f23424e3a2d03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://image.cqvip.com/vip1000/qk/84009A/84009A.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11432-012-4638-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918650461?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21367,27901,27902,33721,33722,41464,42533,43781,51294</link.rule.ids></links><search><creatorcontrib>Deng, ZhiHong</creatorcontrib><creatorcontrib>Wang, ZhongHui</creatorcontrib><creatorcontrib>Jiang, JiaJian</creatorcontrib><title>A new algorithm for fast mining frequent itemsets using N-lists</title><title>Science China. Information sciences</title><addtitle>Sci. China Inf. Sci</addtitle><addtitle>SCIENCE CHINA Information Sciences</addtitle><description>Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets＇ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O（m ＋n） by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though tile algorithm consumes more memory when the datasets are sparse, it is still the fastest one.</description><subject>Algorithms</subject><subject>Computer Science</subject><subject>Counting</subject><subject>Data mining</subject><subject>Data structures</subject><subject>Datasets</subject><subject>FP-树</subject><subject>Information Systems and Communication Service</subject><subject>Mining</subject><subject>Representations</subject><subject>Research Paper</subject><subject>Stores</subject><subject>Strategy</subject><subject>Synthetic data</subject><subject>Tasks</subject><subject>候选项目集</subject><subject>实验评价</subject><subject>挖掘算法</subject><subject>数据挖掘</subject><subject>数据结构</subject><subject>数据表示</subject><subject>频繁项集</subject><issn>1674-733X</issn><issn>1869-1919</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kE1LxDAQhosouKz7A7xVvHiJZjJJk55kWfwC0YuCt5B2026XfuwmLeL-erN0UfBgLgnD885Mnig6B3oNlMobD8CREQqM8AQV2R1FE1BJSiCF9Di8E8mJRPw4jWber2k4iJRJNYlu53FrP2NTl52r-lUTF52LC-P7uKnaqi3jwtntYNs-rnrbeNv7ePD7-gupK9_7s-ikMLW3s8M9jd7v794Wj-T59eFpMX8mOUrREyMzxTA3uKSQZ1xCkXEBGQqAVCZUKFEgmiyjuQTDhFRZqDDkjFs0bElxGl2NfTeuC_v4XjeVz21dm9Z2g9fhi4CJFFwG9PIPuu4G14btNEuDFkF5AoGCkcpd572zhd64qjHuSwPVe6t6tKqDVb23qnchw8aMD2xbWvfb-b_QxWHQqmvLbcj9TOJIUaZK4Tdb2IOV</recordid><startdate>20120901</startdate><enddate>20120901</enddate><creator>Deng, ZhiHong</creator><creator>Wang, ZhongHui</creator><creator>Jiang, JiaJian</creator><general>SP Science China Press</general><general>Springer Nature B.V</general><scope>2RA</scope><scope>92L</scope><scope>CQIGP</scope><scope>W92</scope><scope>~WA</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>7SC</scope><scope>8FD</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20120901</creationdate><title>A new algorithm for fast mining frequent itemsets using N-lists</title><author>Deng, ZhiHong ; Wang, ZhongHui ; Jiang, JiaJian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c375t-a7b823ca3d01cb471fb451b35119760585f33abb0c71a2578b85f23424e3a2d03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Computer Science</topic><topic>Counting</topic><topic>Data mining</topic><topic>Data structures</topic><topic>Datasets</topic><topic>FP-树</topic><topic>Information Systems and Communication Service</topic><topic>Mining</topic><topic>Representations</topic><topic>Research Paper</topic><topic>Stores</topic><topic>Strategy</topic><topic>Synthetic data</topic><topic>Tasks</topic><topic>候选项目集</topic><topic>实验评价</topic><topic>挖掘算法</topic><topic>数据挖掘</topic><topic>数据结构</topic><topic>数据表示</topic><topic>频繁项集</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Deng, ZhiHong</creatorcontrib><creatorcontrib>Wang, ZhongHui</creatorcontrib><creatorcontrib>Jiang, JiaJian</creatorcontrib><collection>中文科技期刊数据库</collection><collection>中文科技期刊数据库-CALIS站点</collection><collection>中文科技期刊数据库-7.0平台</collection><collection>中文科技期刊数据库-工程技术</collection><collection>中文科技期刊数据库- 镜像站点</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Science China. Information sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Deng, ZhiHong</au><au>Wang, ZhongHui</au><au>Jiang, JiaJian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A new algorithm for fast mining frequent itemsets using N-lists</atitle><jtitle>Science China. Information sciences</jtitle><stitle>Sci. China Inf. Sci</stitle><addtitle>SCIENCE CHINA Information Sciences</addtitle><date>2012-09-01</date><risdate>2012</risdate><volume>55</volume><issue>9</issue><spage>2008</spage><epage>2030</epage><pages>2008-2030</pages><issn>1674-733X</issn><eissn>1869-1919</eissn><abstract>Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets＇ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O（m ＋n） by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though tile algorithm consumes more memory when the datasets are sparse, it is still the fastest one.</abstract><cop>Heidelberg</cop><pub>SP Science China Press</pub><doi>10.1007/s11432-012-4638-z</doi><tpages>23</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1674-733X
ispartof	Science China. Information sciences, 2012-09, Vol.55 (9), p.2008-2030
issn	1674-733X 1869-1919
language	eng
recordid	cdi_proquest_miscellaneous_1671367547
source	SpringerLink Journals; Alma/SFX Local Collection; ProQuest Central
subjects	Algorithms Computer Science Counting Data mining Data structures Datasets FP-树 Information Systems and Communication Service Mining Representations Research Paper Stores Strategy Synthetic data Tasks 候选项目集实验评价挖掘算法数据挖掘数据结构数据表示频繁项集
title	A new algorithm for fast mining frequent itemsets using N-lists
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T02%3A45%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20new%20algorithm%20for%20fast%20mining%20frequent%20itemsets%20using%20N-lists&rft.jtitle=Science%20China.%20Information%20sciences&rft.au=Deng,%20ZhiHong&rft.date=2012-09-01&rft.volume=55&rft.issue=9&rft.spage=2008&rft.epage=2030&rft.pages=2008-2030&rft.issn=1674-733X&rft.eissn=1869-1919&rft_id=info:doi/10.1007/s11432-012-4638-z&rft_dat=%3Cproquest_cross%3E2918650461%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918650461&rft_id=info:pmid/&rft_cqvip_id=43037988&rfr_iscdi=true