Efficient mining of maximal frequent itemsets from databases on a cluster of workstations
In this paper, we propose two parallel algorithms for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. One parallel algorithm is named distributed max-miner (DMM), and it requires very low communication and synchronization overhead...
Gespeichert in:
Veröffentlicht in: | Knowledge and information systems 2008-09, Vol.16 (3), p.359-391 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 391 |
---|---|
container_issue | 3 |
container_start_page | 359 |
container_title | Knowledge and information systems |
container_volume | 16 |
creator | Chung, Soon M. Luo, Congnan |
description | In this paper, we propose two parallel algorithms for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. One parallel algorithm is named
distributed max-miner
(DMM), and it requires very low communication and synchronization overhead in distributed computing systems. DMM has the local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. This global mining phase using the prefix tree can work with any local mining algorithm. Another parallel algorithm, named
parallel max-miner
(PMM), is a parallel version of the sequential max-miner algorithm (Proc of ACM SIGMOD Int Conf on Management of Data, 1998, pp 85–93). Most of existing mining algorithms discover the frequent
k
-itemsets on the
k
th pass over the databases, and then generate the candidate (
k
+ 1)-itemsets for the next pass. Compared to those level-wise algorithms, PMM looks ahead at each pass and prunes more candidate itemsets by checking the frequencies of their supersets. Both DMM and PMM were implemented on a cluster of workstations, and their performance was evaluated for various cases. They demonstrate very good performance and scalability even when there are large maximal frequent itemsets (i.e., long patterns) in databases. |
doi_str_mv | 10.1007/s10115-007-0115-1 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_34860645</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1543360951</sourcerecordid><originalsourceid>FETCH-LOGICAL-c407t-907721126d4e79e5ec586d390d5e7865f5f6de654b9770a7bc33c18fe37dae713</originalsourceid><addsrcrecordid>eNp1kE9LxDAQxYsoqKsfwFsR9FbNtE2yPcriPxC86MFTyaaTJWubaCaL-u1NdxcFwdM8Zn7zeLwsOwF2AYzJSwIGwIski7WAneyAldAUFYDY3WqopNzPDomWjIEUAAfZy7UxVlt0MR-ss26Re5MP6tMOqs9NwPfVeLIRB8JIaeOHvFNRzRUh5d7lKtf9iiKG8fHDh1eKKlrv6CjbM6onPN7OSfZ8c_00uyseHm_vZ1cPha6ZjEXDpCwBStHVKBvkqPlUdFXDOo5yKrjhRnQoeD1vpGRKznVVaZgarGSnUEI1yc43vm_Bp7QU28GSxr5XDv2K2qqeCiZqnsDTP-DSr4JL2dqS1amnZJYg2EA6eKKApn0LqYvw1QJrx6bbTdPtKNdiTHC2NVakVW-CctrSz2PJRNlwXieu3HCUTm6B4TfA_-bf0pGNbg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>204311713</pqid></control><display><type>article</type><title>Efficient mining of maximal frequent itemsets from databases on a cluster of workstations</title><source>Springer Nature - Complete Springer Journals</source><creator>Chung, Soon M. ; Luo, Congnan</creator><creatorcontrib>Chung, Soon M. ; Luo, Congnan</creatorcontrib><description>In this paper, we propose two parallel algorithms for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. One parallel algorithm is named
distributed max-miner
(DMM), and it requires very low communication and synchronization overhead in distributed computing systems. DMM has the local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. This global mining phase using the prefix tree can work with any local mining algorithm. Another parallel algorithm, named
parallel max-miner
(PMM), is a parallel version of the sequential max-miner algorithm (Proc of ACM SIGMOD Int Conf on Management of Data, 1998, pp 85–93). Most of existing mining algorithms discover the frequent
k
-itemsets on the
k
th pass over the databases, and then generate the candidate (
k
+ 1)-itemsets for the next pass. Compared to those level-wise algorithms, PMM looks ahead at each pass and prunes more candidate itemsets by checking the frequencies of their supersets. Both DMM and PMM were implemented on a cluster of workstations, and their performance was evaluated for various cases. They demonstrate very good performance and scalability even when there are large maximal frequent itemsets (i.e., long patterns) in databases.</description><identifier>ISSN: 0219-1377</identifier><identifier>EISSN: 0219-3116</identifier><identifier>DOI: 10.1007/s10115-007-0115-1</identifier><identifier>CODEN: KISNCR</identifier><language>eng</language><publisher>London: Springer-Verlag</publisher><subject>Algorithms ; Applied sciences ; Computer Science ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; Data mining ; Data Mining and Knowledge Discovery ; Data processing. List processing. Character string processing ; Database Management ; Exact sciences and technology ; Graphs ; Information Storage and Retrieval ; Information Systems and Communication Service ; Information Systems Applications (incl.Internet) ; Information systems. Data bases ; IT in Business ; Memory organisation. Data processing ; Regular Paper ; Software ; Studies ; Work stations</subject><ispartof>Knowledge and information systems, 2008-09, Vol.16 (3), p.359-391</ispartof><rights>Springer-Verlag London Limited 2007</rights><rights>2009 INIST-CNRS</rights><rights>Springer-Verlag London Limited 2008</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c407t-907721126d4e79e5ec586d390d5e7865f5f6de654b9770a7bc33c18fe37dae713</citedby><cites>FETCH-LOGICAL-c407t-907721126d4e79e5ec586d390d5e7865f5f6de654b9770a7bc33c18fe37dae713</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10115-007-0115-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10115-007-0115-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51298</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=20629554$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Chung, Soon M.</creatorcontrib><creatorcontrib>Luo, Congnan</creatorcontrib><title>Efficient mining of maximal frequent itemsets from databases on a cluster of workstations</title><title>Knowledge and information systems</title><addtitle>Knowl Inf Syst</addtitle><description>In this paper, we propose two parallel algorithms for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. One parallel algorithm is named
distributed max-miner
(DMM), and it requires very low communication and synchronization overhead in distributed computing systems. DMM has the local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. This global mining phase using the prefix tree can work with any local mining algorithm. Another parallel algorithm, named
parallel max-miner
(PMM), is a parallel version of the sequential max-miner algorithm (Proc of ACM SIGMOD Int Conf on Management of Data, 1998, pp 85–93). Most of existing mining algorithms discover the frequent
k
-itemsets on the
k
th pass over the databases, and then generate the candidate (
k
+ 1)-itemsets for the next pass. Compared to those level-wise algorithms, PMM looks ahead at each pass and prunes more candidate itemsets by checking the frequencies of their supersets. Both DMM and PMM were implemented on a cluster of workstations, and their performance was evaluated for various cases. They demonstrate very good performance and scalability even when there are large maximal frequent itemsets (i.e., long patterns) in databases.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Computer Science</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>Data mining</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Data processing. List processing. Character string processing</subject><subject>Database Management</subject><subject>Exact sciences and technology</subject><subject>Graphs</subject><subject>Information Storage and Retrieval</subject><subject>Information Systems and Communication Service</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Information systems. Data bases</subject><subject>IT in Business</subject><subject>Memory organisation. Data processing</subject><subject>Regular Paper</subject><subject>Software</subject><subject>Studies</subject><subject>Work stations</subject><issn>0219-1377</issn><issn>0219-3116</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kE9LxDAQxYsoqKsfwFsR9FbNtE2yPcriPxC86MFTyaaTJWubaCaL-u1NdxcFwdM8Zn7zeLwsOwF2AYzJSwIGwIski7WAneyAldAUFYDY3WqopNzPDomWjIEUAAfZy7UxVlt0MR-ss26Re5MP6tMOqs9NwPfVeLIRB8JIaeOHvFNRzRUh5d7lKtf9iiKG8fHDh1eKKlrv6CjbM6onPN7OSfZ8c_00uyseHm_vZ1cPha6ZjEXDpCwBStHVKBvkqPlUdFXDOo5yKrjhRnQoeD1vpGRKznVVaZgarGSnUEI1yc43vm_Bp7QU28GSxr5XDv2K2qqeCiZqnsDTP-DSr4JL2dqS1amnZJYg2EA6eKKApn0LqYvw1QJrx6bbTdPtKNdiTHC2NVakVW-CctrSz2PJRNlwXieu3HCUTm6B4TfA_-bf0pGNbg</recordid><startdate>20080901</startdate><enddate>20080901</enddate><creator>Chung, Soon M.</creator><creator>Luo, Congnan</creator><general>Springer-Verlag</general><general>Springer</general><general>Springer Nature B.V</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>0U~</scope><scope>1-H</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L.0</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20080901</creationdate><title>Efficient mining of maximal frequent itemsets from databases on a cluster of workstations</title><author>Chung, Soon M. ; Luo, Congnan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c407t-907721126d4e79e5ec586d390d5e7865f5f6de654b9770a7bc33c18fe37dae713</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Computer Science</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>Data mining</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Data processing. List processing. Character string processing</topic><topic>Database Management</topic><topic>Exact sciences and technology</topic><topic>Graphs</topic><topic>Information Storage and Retrieval</topic><topic>Information Systems and Communication Service</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Information systems. Data bases</topic><topic>IT in Business</topic><topic>Memory organisation. Data processing</topic><topic>Regular Paper</topic><topic>Software</topic><topic>Studies</topic><topic>Work stations</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chung, Soon M.</creatorcontrib><creatorcontrib>Luo, Congnan</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Global News & ABI/Inform Professional</collection><collection>Trade PRO</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM Professional Standard</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Knowledge and information systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chung, Soon M.</au><au>Luo, Congnan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient mining of maximal frequent itemsets from databases on a cluster of workstations</atitle><jtitle>Knowledge and information systems</jtitle><stitle>Knowl Inf Syst</stitle><date>2008-09-01</date><risdate>2008</risdate><volume>16</volume><issue>3</issue><spage>359</spage><epage>391</epage><pages>359-391</pages><issn>0219-1377</issn><eissn>0219-3116</eissn><coden>KISNCR</coden><abstract>In this paper, we propose two parallel algorithms for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. One parallel algorithm is named
distributed max-miner
(DMM), and it requires very low communication and synchronization overhead in distributed computing systems. DMM has the local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. This global mining phase using the prefix tree can work with any local mining algorithm. Another parallel algorithm, named
parallel max-miner
(PMM), is a parallel version of the sequential max-miner algorithm (Proc of ACM SIGMOD Int Conf on Management of Data, 1998, pp 85–93). Most of existing mining algorithms discover the frequent
k
-itemsets on the
k
th pass over the databases, and then generate the candidate (
k
+ 1)-itemsets for the next pass. Compared to those level-wise algorithms, PMM looks ahead at each pass and prunes more candidate itemsets by checking the frequencies of their supersets. Both DMM and PMM were implemented on a cluster of workstations, and their performance was evaluated for various cases. They demonstrate very good performance and scalability even when there are large maximal frequent itemsets (i.e., long patterns) in databases.</abstract><cop>London</cop><pub>Springer-Verlag</pub><doi>10.1007/s10115-007-0115-1</doi><tpages>33</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0219-1377 |
ispartof | Knowledge and information systems, 2008-09, Vol.16 (3), p.359-391 |
issn | 0219-1377 0219-3116 |
language | eng |
recordid | cdi_proquest_miscellaneous_34860645 |
source | Springer Nature - Complete Springer Journals |
subjects | Algorithms Applied sciences Computer Science Computer science control theory systems Computer systems and distributed systems. User interface Data mining Data Mining and Knowledge Discovery Data processing. List processing. Character string processing Database Management Exact sciences and technology Graphs Information Storage and Retrieval Information Systems and Communication Service Information Systems Applications (incl.Internet) Information systems. Data bases IT in Business Memory organisation. Data processing Regular Paper Software Studies Work stations |
title | Efficient mining of maximal frequent itemsets from databases on a cluster of workstations |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T11%3A15%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20mining%20of%20maximal%20frequent%20itemsets%20from%20databases%20on%20a%20cluster%20of%20workstations&rft.jtitle=Knowledge%20and%20information%20systems&rft.au=Chung,%20Soon%20M.&rft.date=2008-09-01&rft.volume=16&rft.issue=3&rft.spage=359&rft.epage=391&rft.pages=359-391&rft.issn=0219-1377&rft.eissn=0219-3116&rft.coden=KISNCR&rft_id=info:doi/10.1007/s10115-007-0115-1&rft_dat=%3Cproquest_cross%3E1543360951%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=204311713&rft_id=info:pmid/&rfr_iscdi=true |