A Distributed Method for Fast Mining Frequent Patterns From Big Data

In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centraliz...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2021, Vol.9, p.135144-135159
Hauptverfasser: Huang, Peng-Yu, Cheng, Wan-Shu, Chen, Ju-Chin, Chung, Wen-Yu, Chen, Young-Lin, Lin, Kawuu W.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 135159
container_issue
container_start_page 135144
container_title IEEE access
container_volume 9
creator Huang, Peng-Yu
Cheng, Wan-Shu
Chen, Ju-Chin
Chung, Wen-Yu
Chen, Young-Lin
Lin, Kawuu W.
description In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.
doi_str_mv 10.1109/ACCESS.2021.3115514
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9548089</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9548089</ieee_id><doaj_id>oai_doaj_org_article_14dac2691ed942c89126e46cb2bb3fcf</doaj_id><sourcerecordid>2580100250</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-a5275b5fc51de8e52b6f7341a439e52fba1c475c0b6386589cc7835630f255a93</originalsourceid><addsrcrecordid>eNpNUE1LAzEQXURB0f4CLwHPrfmabHKsrdWColA9h2w2qSm60SQ9-O-NrohzmZnHvDczr2nOCZ4RgtXlfLG43mxmFFMyY4QAEH7QnFAi1JQBE4f_6uNmkvMO15AVgvakWc7RMuSSQrcvrkf3rrzEHvmY0Mrkgu7DEIYtWiX3sXdDQY-mFJeGXJH4hq7CFi1NMWfNkTev2U1-82nzvLp-WtxO7x5u1ov53dRyLMvUAG2hA2-B9E46oJ3wLePEcKZq5ztDLG_B4k4wKUAqa1vJQDDsKYBR7LRZj7p9NDv9nsKbSZ86mqB_gJi22qQS7KvThPfGUqGI6xWnVipChePCdrTrmLe-al2MWu8p1t9y0bu4T0M9X1OQmGBMAdcpNk7ZFHNOzv9tJVh_u69H9_W3-_rX_co6H1nBOffHUMAllop9AWeGfgI</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2580100250</pqid></control><display><type>article</type><title>A Distributed Method for Fast Mining Frequent Patterns From Big Data</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Huang, Peng-Yu ; Cheng, Wan-Shu ; Chen, Ju-Chin ; Chung, Wen-Yu ; Chen, Young-Lin ; Lin, Kawuu W.</creator><creatorcontrib>Huang, Peng-Yu ; Cheng, Wan-Shu ; Chen, Ju-Chin ; Chung, Wen-Yu ; Chen, Young-Lin ; Lin, Kawuu W.</creatorcontrib><description>In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3115514</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Artificial intelligence ; Big Data ; Computer networks ; Costs ; Data mining ; Data structures ; distributed computing ; Distributed databases ; Distributed processing ; Energy limitation ; Itemsets ; Massive data points ; Memory management ; parallel algorithms ; Pattern analysis ; Performance enhancement</subject><ispartof>IEEE access, 2021, Vol.9, p.135144-135159</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-a5275b5fc51de8e52b6f7341a439e52fba1c475c0b6386589cc7835630f255a93</citedby><cites>FETCH-LOGICAL-c408t-a5275b5fc51de8e52b6f7341a439e52fba1c475c0b6386589cc7835630f255a93</cites><orcidid>0000-0001-7126-8096 ; 0000-0002-1669-1008</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9548089$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Huang, Peng-Yu</creatorcontrib><creatorcontrib>Cheng, Wan-Shu</creatorcontrib><creatorcontrib>Chen, Ju-Chin</creatorcontrib><creatorcontrib>Chung, Wen-Yu</creatorcontrib><creatorcontrib>Chen, Young-Lin</creatorcontrib><creatorcontrib>Lin, Kawuu W.</creatorcontrib><title>A Distributed Method for Fast Mining Frequent Patterns From Big Data</title><title>IEEE access</title><addtitle>Access</addtitle><description>In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.</description><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Big Data</subject><subject>Computer networks</subject><subject>Costs</subject><subject>Data mining</subject><subject>Data structures</subject><subject>distributed computing</subject><subject>Distributed databases</subject><subject>Distributed processing</subject><subject>Energy limitation</subject><subject>Itemsets</subject><subject>Massive data points</subject><subject>Memory management</subject><subject>parallel algorithms</subject><subject>Pattern analysis</subject><subject>Performance enhancement</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUE1LAzEQXURB0f4CLwHPrfmabHKsrdWColA9h2w2qSm60SQ9-O-NrohzmZnHvDczr2nOCZ4RgtXlfLG43mxmFFMyY4QAEH7QnFAi1JQBE4f_6uNmkvMO15AVgvakWc7RMuSSQrcvrkf3rrzEHvmY0Mrkgu7DEIYtWiX3sXdDQY-mFJeGXJH4hq7CFi1NMWfNkTev2U1-82nzvLp-WtxO7x5u1ov53dRyLMvUAG2hA2-B9E46oJ3wLePEcKZq5ztDLG_B4k4wKUAqa1vJQDDsKYBR7LRZj7p9NDv9nsKbSZ86mqB_gJi22qQS7KvThPfGUqGI6xWnVipChePCdrTrmLe-al2MWu8p1t9y0bu4T0M9X1OQmGBMAdcpNk7ZFHNOzv9tJVh_u69H9_W3-_rX_co6H1nBOffHUMAllop9AWeGfgI</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Huang, Peng-Yu</creator><creator>Cheng, Wan-Shu</creator><creator>Chen, Ju-Chin</creator><creator>Chung, Wen-Yu</creator><creator>Chen, Young-Lin</creator><creator>Lin, Kawuu W.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7126-8096</orcidid><orcidid>https://orcid.org/0000-0002-1669-1008</orcidid></search><sort><creationdate>2021</creationdate><title>A Distributed Method for Fast Mining Frequent Patterns From Big Data</title><author>Huang, Peng-Yu ; Cheng, Wan-Shu ; Chen, Ju-Chin ; Chung, Wen-Yu ; Chen, Young-Lin ; Lin, Kawuu W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-a5275b5fc51de8e52b6f7341a439e52fba1c475c0b6386589cc7835630f255a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Big Data</topic><topic>Computer networks</topic><topic>Costs</topic><topic>Data mining</topic><topic>Data structures</topic><topic>distributed computing</topic><topic>Distributed databases</topic><topic>Distributed processing</topic><topic>Energy limitation</topic><topic>Itemsets</topic><topic>Massive data points</topic><topic>Memory management</topic><topic>parallel algorithms</topic><topic>Pattern analysis</topic><topic>Performance enhancement</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Peng-Yu</creatorcontrib><creatorcontrib>Cheng, Wan-Shu</creatorcontrib><creatorcontrib>Chen, Ju-Chin</creatorcontrib><creatorcontrib>Chung, Wen-Yu</creatorcontrib><creatorcontrib>Chen, Young-Lin</creatorcontrib><creatorcontrib>Lin, Kawuu W.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Peng-Yu</au><au>Cheng, Wan-Shu</au><au>Chen, Ju-Chin</au><au>Chung, Wen-Yu</au><au>Chen, Young-Lin</au><au>Lin, Kawuu W.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Distributed Method for Fast Mining Frequent Patterns From Big Data</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>135144</spage><epage>135159</epage><pages>135144-135159</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3115514</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0001-7126-8096</orcidid><orcidid>https://orcid.org/0000-0002-1669-1008</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2021, Vol.9, p.135144-135159
issn 2169-3536
2169-3536
language eng
recordid cdi_ieee_primary_9548089
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects Algorithms
Artificial intelligence
Big Data
Computer networks
Costs
Data mining
Data structures
distributed computing
Distributed databases
Distributed processing
Energy limitation
Itemsets
Massive data points
Memory management
parallel algorithms
Pattern analysis
Performance enhancement
title A Distributed Method for Fast Mining Frequent Patterns From Big Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T05%3A12%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Distributed%20Method%20for%20Fast%20Mining%20Frequent%20Patterns%20From%20Big%20Data&rft.jtitle=IEEE%20access&rft.au=Huang,%20Peng-Yu&rft.date=2021&rft.volume=9&rft.spage=135144&rft.epage=135159&rft.pages=135144-135159&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3115514&rft_dat=%3Cproquest_ieee_%3E2580100250%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2580100250&rft_id=info:pmid/&rft_ieee_id=9548089&rft_doaj_id=oai_doaj_org_article_14dac2691ed942c89126e46cb2bb3fcf&rfr_iscdi=true