A Distributed Method for Fast Mining Frequent Patterns From Big Data
In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centraliz...
Gespeichert in:
Veröffentlicht in: | IEEE access 2021, Vol.9, p.135144-135159 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 135159 |
---|---|
container_issue | |
container_start_page | 135144 |
container_title | IEEE access |
container_volume | 9 |
creator | Huang, Peng-Yu Cheng, Wan-Shu Chen, Ju-Chin Chung, Wen-Yu Chen, Young-Lin Lin, Kawuu W. |
description | In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM. |
doi_str_mv | 10.1109/ACCESS.2021.3115514 |
format | Article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9548089</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9548089</ieee_id><doaj_id>oai_doaj_org_article_14dac2691ed942c89126e46cb2bb3fcf</doaj_id><sourcerecordid>2580100250</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-a5275b5fc51de8e52b6f7341a439e52fba1c475c0b6386589cc7835630f255a93</originalsourceid><addsrcrecordid>eNpNUE1LAzEQXURB0f4CLwHPrfmabHKsrdWColA9h2w2qSm60SQ9-O-NrohzmZnHvDczr2nOCZ4RgtXlfLG43mxmFFMyY4QAEH7QnFAi1JQBE4f_6uNmkvMO15AVgvakWc7RMuSSQrcvrkf3rrzEHvmY0Mrkgu7DEIYtWiX3sXdDQY-mFJeGXJH4hq7CFi1NMWfNkTev2U1-82nzvLp-WtxO7x5u1ov53dRyLMvUAG2hA2-B9E46oJ3wLePEcKZq5ztDLG_B4k4wKUAqa1vJQDDsKYBR7LRZj7p9NDv9nsKbSZ86mqB_gJi22qQS7KvThPfGUqGI6xWnVipChePCdrTrmLe-al2MWu8p1t9y0bu4T0M9X1OQmGBMAdcpNk7ZFHNOzv9tJVh_u69H9_W3-_rX_co6H1nBOffHUMAllop9AWeGfgI</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2580100250</pqid></control><display><type>article</type><title>A Distributed Method for Fast Mining Frequent Patterns From Big Data</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Huang, Peng-Yu ; Cheng, Wan-Shu ; Chen, Ju-Chin ; Chung, Wen-Yu ; Chen, Young-Lin ; Lin, Kawuu W.</creator><creatorcontrib>Huang, Peng-Yu ; Cheng, Wan-Shu ; Chen, Ju-Chin ; Chung, Wen-Yu ; Chen, Young-Lin ; Lin, Kawuu W.</creatorcontrib><description>In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3115514</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Artificial intelligence ; Big Data ; Computer networks ; Costs ; Data mining ; Data structures ; distributed computing ; Distributed databases ; Distributed processing ; Energy limitation ; Itemsets ; Massive data points ; Memory management ; parallel algorithms ; Pattern analysis ; Performance enhancement</subject><ispartof>IEEE access, 2021, Vol.9, p.135144-135159</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-a5275b5fc51de8e52b6f7341a439e52fba1c475c0b6386589cc7835630f255a93</citedby><cites>FETCH-LOGICAL-c408t-a5275b5fc51de8e52b6f7341a439e52fba1c475c0b6386589cc7835630f255a93</cites><orcidid>0000-0001-7126-8096 ; 0000-0002-1669-1008</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9548089$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Huang, Peng-Yu</creatorcontrib><creatorcontrib>Cheng, Wan-Shu</creatorcontrib><creatorcontrib>Chen, Ju-Chin</creatorcontrib><creatorcontrib>Chung, Wen-Yu</creatorcontrib><creatorcontrib>Chen, Young-Lin</creatorcontrib><creatorcontrib>Lin, Kawuu W.</creatorcontrib><title>A Distributed Method for Fast Mining Frequent Patterns From Big Data</title><title>IEEE access</title><addtitle>Access</addtitle><description>In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.</description><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Big Data</subject><subject>Computer networks</subject><subject>Costs</subject><subject>Data mining</subject><subject>Data structures</subject><subject>distributed computing</subject><subject>Distributed databases</subject><subject>Distributed processing</subject><subject>Energy limitation</subject><subject>Itemsets</subject><subject>Massive data points</subject><subject>Memory management</subject><subject>parallel algorithms</subject><subject>Pattern analysis</subject><subject>Performance enhancement</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUE1LAzEQXURB0f4CLwHPrfmabHKsrdWColA9h2w2qSm60SQ9-O-NrohzmZnHvDczr2nOCZ4RgtXlfLG43mxmFFMyY4QAEH7QnFAi1JQBE4f_6uNmkvMO15AVgvakWc7RMuSSQrcvrkf3rrzEHvmY0Mrkgu7DEIYtWiX3sXdDQY-mFJeGXJH4hq7CFi1NMWfNkTev2U1-82nzvLp-WtxO7x5u1ov53dRyLMvUAG2hA2-B9E46oJ3wLePEcKZq5ztDLG_B4k4wKUAqa1vJQDDsKYBR7LRZj7p9NDv9nsKbSZ86mqB_gJi22qQS7KvThPfGUqGI6xWnVipChePCdrTrmLe-al2MWu8p1t9y0bu4T0M9X1OQmGBMAdcpNk7ZFHNOzv9tJVh_u69H9_W3-_rX_co6H1nBOffHUMAllop9AWeGfgI</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Huang, Peng-Yu</creator><creator>Cheng, Wan-Shu</creator><creator>Chen, Ju-Chin</creator><creator>Chung, Wen-Yu</creator><creator>Chen, Young-Lin</creator><creator>Lin, Kawuu W.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7126-8096</orcidid><orcidid>https://orcid.org/0000-0002-1669-1008</orcidid></search><sort><creationdate>2021</creationdate><title>A Distributed Method for Fast Mining Frequent Patterns From Big Data</title><author>Huang, Peng-Yu ; Cheng, Wan-Shu ; Chen, Ju-Chin ; Chung, Wen-Yu ; Chen, Young-Lin ; Lin, Kawuu W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-a5275b5fc51de8e52b6f7341a439e52fba1c475c0b6386589cc7835630f255a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Big Data</topic><topic>Computer networks</topic><topic>Costs</topic><topic>Data mining</topic><topic>Data structures</topic><topic>distributed computing</topic><topic>Distributed databases</topic><topic>Distributed processing</topic><topic>Energy limitation</topic><topic>Itemsets</topic><topic>Massive data points</topic><topic>Memory management</topic><topic>parallel algorithms</topic><topic>Pattern analysis</topic><topic>Performance enhancement</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Peng-Yu</creatorcontrib><creatorcontrib>Cheng, Wan-Shu</creatorcontrib><creatorcontrib>Chen, Ju-Chin</creatorcontrib><creatorcontrib>Chung, Wen-Yu</creatorcontrib><creatorcontrib>Chen, Young-Lin</creatorcontrib><creatorcontrib>Lin, Kawuu W.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Peng-Yu</au><au>Cheng, Wan-Shu</au><au>Chen, Ju-Chin</au><au>Chung, Wen-Yu</au><au>Chen, Young-Lin</au><au>Lin, Kawuu W.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Distributed Method for Fast Mining Frequent Patterns From Big Data</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>135144</spage><epage>135159</epage><pages>135144-135159</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3115514</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0001-7126-8096</orcidid><orcidid>https://orcid.org/0000-0002-1669-1008</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2021, Vol.9, p.135144-135159 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_ieee_primary_9548089 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals |
subjects | Algorithms Artificial intelligence Big Data Computer networks Costs Data mining Data structures distributed computing Distributed databases Distributed processing Energy limitation Itemsets Massive data points Memory management parallel algorithms Pattern analysis Performance enhancement |
title | A Distributed Method for Fast Mining Frequent Patterns From Big Data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T05%3A12%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Distributed%20Method%20for%20Fast%20Mining%20Frequent%20Patterns%20From%20Big%20Data&rft.jtitle=IEEE%20access&rft.au=Huang,%20Peng-Yu&rft.date=2021&rft.volume=9&rft.spage=135144&rft.epage=135159&rft.pages=135144-135159&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3115514&rft_dat=%3Cproquest_ieee_%3E2580100250%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2580100250&rft_id=info:pmid/&rft_ieee_id=9548089&rft_doaj_id=oai_doaj_org_article_14dac2691ed942c89126e46cb2bb3fcf&rfr_iscdi=true |