Parallel and Distributed Frequent Pattern Mining in Large Databases

Recently, a significant number of parallel and distributed algorithms have been proposed to mine frequent patterns (FP) from large and/or distributed databases. Among them parallelization of the FP-growth algorithms using the FP-tree has been proved to be highly efficient. However, the FP-tree-based...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Tanbeer, S.K., Ahmed, C.F., Byeong-Soo Jeong
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 414
container_issue
container_start_page 407
container_title
container_volume
creator Tanbeer, S.K.
Ahmed, C.F.
Byeong-Soo Jeong
description Recently, a significant number of parallel and distributed algorithms have been proposed to mine frequent patterns (FP) from large and/or distributed databases. Among them parallelization of the FP-growth algorithms using the FP-tree has been proved to be highly efficient. However, the FP-tree-based techniques suffer from two major limitations such as multiple database scans requirement (i.e., high I/O cost) and high inter-processor communications cost (during the mining phase). Therefore, we propose a novel tree structure, called PP-tree (Parallel Pattern tree) that significantly reduces the I/O cost by capturing the database contents with a single scan and facilitates the efficient FP-growth mining on it with reduced inter-processor communication overhead. Our parallel algorithm works independently at each local site and locally generates global frequent patterns which are merged at the final stage. The experimental results reflect that parallel and distributed FP mining with PP-tree outperforms other state-of-the-art algorithms.
doi_str_mv 10.1109/HPCC.2009.37
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5167021</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5167021</ieee_id><sourcerecordid>5167021</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-e5844df812a6a4d1072a82de88df3813755c7ac61d168e0f851db61df3d24f433</originalsourceid><addsrcrecordid>eNotzL1OwzAUQGEjhAQt3dhY_AIp9_o_I0ppixREBpirm_qmshQicNyBtwcJpqNvOULcIawRoX7Yd02zVgD1WvsLsQDvaqu9DvpSLNAoY4wD8NdiNc-pBw3gtEF1I5qOMo0jj5KmKDdpLjn158JRbjN_nXkqsqNSOE_yJU1pOsk0yZbyieWGCvU083wrrgYaZ179dynet09vzb5qX3fPzWNbJfS2VGyDMXEIqMiRiQheUVCRQ4iDDqi9tUdPR4cRXWAYgsXY_2rQUZnBaL0U93_fxMyHz5w-KH8fLDoPCvUPyjxJPw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Parallel and Distributed Frequent Pattern Mining in Large Databases</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Tanbeer, S.K. ; Ahmed, C.F. ; Byeong-Soo Jeong</creator><creatorcontrib>Tanbeer, S.K. ; Ahmed, C.F. ; Byeong-Soo Jeong</creatorcontrib><description>Recently, a significant number of parallel and distributed algorithms have been proposed to mine frequent patterns (FP) from large and/or distributed databases. Among them parallelization of the FP-growth algorithms using the FP-tree has been proved to be highly efficient. However, the FP-tree-based techniques suffer from two major limitations such as multiple database scans requirement (i.e., high I/O cost) and high inter-processor communications cost (during the mining phase). Therefore, we propose a novel tree structure, called PP-tree (Parallel Pattern tree) that significantly reduces the I/O cost by capturing the database contents with a single scan and facilitates the efficient FP-growth mining on it with reduced inter-processor communication overhead. Our parallel algorithm works independently at each local site and locally generates global frequent patterns which are merged at the final stage. The experimental results reflect that parallel and distributed FP mining with PP-tree outperforms other state-of-the-art algorithms.</description><identifier>ISBN: 1424446007</identifier><identifier>ISBN: 9781424446001</identifier><identifier>EISBN: 0769537383</identifier><identifier>EISBN: 9780769537382</identifier><identifier>DOI: 10.1109/HPCC.2009.37</identifier><language>eng</language><publisher>IEEE</publisher><subject>Broadcasting ; Concurrent computing ; Costs ; Data engineering ; Data mining ; Distributed computing ; Distributed databases ; Frequency ; High performance computing ; Tree data structures</subject><ispartof>2009 11th IEEE International Conference on High Performance Computing and Communications, 2009, p.407-414</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5167021$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>310,311,782,786,791,792,2060,27932,54927</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5167021$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Tanbeer, S.K.</creatorcontrib><creatorcontrib>Ahmed, C.F.</creatorcontrib><creatorcontrib>Byeong-Soo Jeong</creatorcontrib><title>Parallel and Distributed Frequent Pattern Mining in Large Databases</title><title>2009 11th IEEE International Conference on High Performance Computing and Communications</title><addtitle>HPCC</addtitle><description>Recently, a significant number of parallel and distributed algorithms have been proposed to mine frequent patterns (FP) from large and/or distributed databases. Among them parallelization of the FP-growth algorithms using the FP-tree has been proved to be highly efficient. However, the FP-tree-based techniques suffer from two major limitations such as multiple database scans requirement (i.e., high I/O cost) and high inter-processor communications cost (during the mining phase). Therefore, we propose a novel tree structure, called PP-tree (Parallel Pattern tree) that significantly reduces the I/O cost by capturing the database contents with a single scan and facilitates the efficient FP-growth mining on it with reduced inter-processor communication overhead. Our parallel algorithm works independently at each local site and locally generates global frequent patterns which are merged at the final stage. The experimental results reflect that parallel and distributed FP mining with PP-tree outperforms other state-of-the-art algorithms.</description><subject>Broadcasting</subject><subject>Concurrent computing</subject><subject>Costs</subject><subject>Data engineering</subject><subject>Data mining</subject><subject>Distributed computing</subject><subject>Distributed databases</subject><subject>Frequency</subject><subject>High performance computing</subject><subject>Tree data structures</subject><isbn>1424446007</isbn><isbn>9781424446001</isbn><isbn>0769537383</isbn><isbn>9780769537382</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotzL1OwzAUQGEjhAQt3dhY_AIp9_o_I0ppixREBpirm_qmshQicNyBtwcJpqNvOULcIawRoX7Yd02zVgD1WvsLsQDvaqu9DvpSLNAoY4wD8NdiNc-pBw3gtEF1I5qOMo0jj5KmKDdpLjn158JRbjN_nXkqsqNSOE_yJU1pOsk0yZbyieWGCvU083wrrgYaZ179dynet09vzb5qX3fPzWNbJfS2VGyDMXEIqMiRiQheUVCRQ4iDDqi9tUdPR4cRXWAYgsXY_2rQUZnBaL0U93_fxMyHz5w-KH8fLDoPCvUPyjxJPw</recordid><startdate>200906</startdate><enddate>200906</enddate><creator>Tanbeer, S.K.</creator><creator>Ahmed, C.F.</creator><creator>Byeong-Soo Jeong</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200906</creationdate><title>Parallel and Distributed Frequent Pattern Mining in Large Databases</title><author>Tanbeer, S.K. ; Ahmed, C.F. ; Byeong-Soo Jeong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-e5844df812a6a4d1072a82de88df3813755c7ac61d168e0f851db61df3d24f433</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Broadcasting</topic><topic>Concurrent computing</topic><topic>Costs</topic><topic>Data engineering</topic><topic>Data mining</topic><topic>Distributed computing</topic><topic>Distributed databases</topic><topic>Frequency</topic><topic>High performance computing</topic><topic>Tree data structures</topic><toplevel>online_resources</toplevel><creatorcontrib>Tanbeer, S.K.</creatorcontrib><creatorcontrib>Ahmed, C.F.</creatorcontrib><creatorcontrib>Byeong-Soo Jeong</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tanbeer, S.K.</au><au>Ahmed, C.F.</au><au>Byeong-Soo Jeong</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Parallel and Distributed Frequent Pattern Mining in Large Databases</atitle><btitle>2009 11th IEEE International Conference on High Performance Computing and Communications</btitle><stitle>HPCC</stitle><date>2009-06</date><risdate>2009</risdate><spage>407</spage><epage>414</epage><pages>407-414</pages><isbn>1424446007</isbn><isbn>9781424446001</isbn><eisbn>0769537383</eisbn><eisbn>9780769537382</eisbn><abstract>Recently, a significant number of parallel and distributed algorithms have been proposed to mine frequent patterns (FP) from large and/or distributed databases. Among them parallelization of the FP-growth algorithms using the FP-tree has been proved to be highly efficient. However, the FP-tree-based techniques suffer from two major limitations such as multiple database scans requirement (i.e., high I/O cost) and high inter-processor communications cost (during the mining phase). Therefore, we propose a novel tree structure, called PP-tree (Parallel Pattern tree) that significantly reduces the I/O cost by capturing the database contents with a single scan and facilitates the efficient FP-growth mining on it with reduced inter-processor communication overhead. Our parallel algorithm works independently at each local site and locally generates global frequent patterns which are merged at the final stage. The experimental results reflect that parallel and distributed FP mining with PP-tree outperforms other state-of-the-art algorithms.</abstract><pub>IEEE</pub><doi>10.1109/HPCC.2009.37</doi><tpages>8</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 1424446007
ispartof 2009 11th IEEE International Conference on High Performance Computing and Communications, 2009, p.407-414
issn
language eng
recordid cdi_ieee_primary_5167021
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Broadcasting
Concurrent computing
Costs
Data engineering
Data mining
Distributed computing
Distributed databases
Frequency
High performance computing
Tree data structures
title Parallel and Distributed Frequent Pattern Mining in Large Databases
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T23%3A17%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Parallel%20and%20Distributed%20Frequent%20Pattern%20Mining%20in%20Large%20Databases&rft.btitle=2009%2011th%20IEEE%20International%20Conference%20on%20High%20Performance%20Computing%20and%20Communications&rft.au=Tanbeer,%20S.K.&rft.date=2009-06&rft.spage=407&rft.epage=414&rft.pages=407-414&rft.isbn=1424446007&rft.isbn_list=9781424446001&rft_id=info:doi/10.1109/HPCC.2009.37&rft_dat=%3Cieee_6IE%3E5167021%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=0769537383&rft.eisbn_list=9780769537382&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5167021&rfr_iscdi=true