A Distributed Method for Fast Mining Frequent Patterns From Big Data

In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centraliz...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2021, Vol.9, p.135144-135159
Hauptverfasser:	Huang, Peng-Yu, Cheng, Wan-Shu, Chen, Ju-Chin, Chung, Wen-Yu, Chen, Young-Lin, Lin, Kawuu W.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial intelligence Big Data Computer networks Costs Data mining Data structures distributed computing Distributed databases Distributed processing Energy limitation Itemsets Massive data points Memory management parallel algorithms Pattern analysis Performance enhancement
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	135159
container_issue
container_start_page	135144
container_title	IEEE access
container_volume	9
creator	Huang, Peng-Yu Cheng, Wan-Shu Chen, Ju-Chin Chung, Wen-Yu Chen, Young-Lin Lin, Kawuu W.
description	In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.
doi_str_mv	10.1109/ACCESS.2021.3115514
format	Article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9548089</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9548089</ieee_id><doaj_id>oai_doaj_org_article_14dac2691ed942c89126e46cb2bb3fcf</doaj_id><sourcerecordid>2580100250</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-a5275b5fc51de8e52b6f7341a439e52fba1c475c0b6386589cc7835630f255a93</originalsourceid><addsrcrecordid>eNpNUE1LAzEQXURB0f4CLwHPrfmabHKsrdWColA9h2w2qSm60SQ9-O-NrohzmZnHvDczr2nOCZ4RgtXlfLG43mxmFFMyY4QAEH7QnFAi1JQBE4f_6uNmkvMO15AVgvakWc7RMuSSQrcvrkf3rrzEHvmY0Mrkgu7DEIYtWiX3sXdDQY-mFJeGXJH4hq7CFi1NMWfNkTev2U1-82nzvLp-WtxO7x5u1ov53dRyLMvUAG2hA2-B9E46oJ3wLePEcKZq5ztDLG_B4k4wKUAqa1vJQDDsKYBR7LRZj7p9NDv9nsKbSZ86mqB_gJi22qQS7KvThPfGUqGI6xWnVipChePCdrTrmLe-al2MWu8p1t9y0bu4T0M9X1OQmGBMAdcpNk7ZFHNOzv9tJVh_u69H9_W3-_rX_co6H1nBOffHUMAllop9AWeGfgI</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2580100250</pqid></control><display><type>article</type><title>A Distributed Method for Fast Mining Frequent Patterns From Big Data</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Huang, Peng-Yu ; Cheng, Wan-Shu ; Chen, Ju-Chin ; Chung, Wen-Yu ; Chen, Young-Lin ; Lin, Kawuu W.</creator><creatorcontrib>Huang, Peng-Yu ; Cheng, Wan-Shu ; Chen, Ju-Chin ; Chung, Wen-Yu ; Chen, Young-Lin ; Lin, Kawuu W.</creatorcontrib><description>In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3115514</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Artificial intelligence ; Big Data ; Computer networks ; Costs ; Data mining ; Data structures ; distributed computing ; Distributed databases ; Distributed processing ; Energy limitation ; Itemsets ; Massive data points ; Memory management ; parallel algorithms ; Pattern analysis ; Performance enhancement</subject><ispartof>IEEE access, 2021, Vol.9, p.135144-135159</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-a5275b5fc51de8e52b6f7341a439e52fba1c475c0b6386589cc7835630f255a93</citedby><cites>FETCH-LOGICAL-c408t-a5275b5fc51de8e52b6f7341a439e52fba1c475c0b6386589cc7835630f255a93</cites><orcidid>0000-0001-7126-8096 ; 0000-0002-1669-1008</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9548089$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Huang, Peng-Yu</creatorcontrib><creatorcontrib>Cheng, Wan-Shu</creatorcontrib><creatorcontrib>Chen, Ju-Chin</creatorcontrib><creatorcontrib>Chung, Wen-Yu</creatorcontrib><creatorcontrib>Chen, Young-Lin</creatorcontrib><creatorcontrib>Lin, Kawuu W.</creatorcontrib><title>A Distributed Method for Fast Mining Frequent Patterns From Big Data</title><title>IEEE access</title><addtitle>Access</addtitle><description>In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.</description><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Big Data</subject><subject>Computer networks</subject><subject>Costs</subject><subject>Data mining</subject><subject>Data structures</subject><subject>distributed computing</subject><subject>Distributed databases</subject><subject>Distributed processing</subject><subject>Energy limitation</subject><subject>Itemsets</subject><subject>Massive data points</subject><subject>Memory management</subject><subject>parallel algorithms</subject><subject>Pattern analysis</subject><subject>Performance enhancement</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUE1LAzEQXURB0f4CLwHPrfmabHKsrdWColA9h2w2qSm60SQ9-O-NrohzmZnHvDczr2nOCZ4RgtXlfLG43mxmFFMyY4QAEH7QnFAi1JQBE4f_6uNmkvMO15AVgvakWc7RMuSSQrcvrkf3rrzEHvmY0Mrkgu7DEIYtWiX3sXdDQY-mFJeGXJH4hq7CFi1NMWfNkTev2U1-82nzvLp-WtxO7x5u1ov53dRyLMvUAG2hA2-B9E46oJ3wLePEcKZq5ztDLG_B4k4wKUAqa1vJQDDsKYBR7LRZj7p9NDv9nsKbSZ86mqB_gJi22qQS7KvThPfGUqGI6xWnVipChePCdrTrmLe-al2MWu8p1t9y0bu4T0M9X1OQmGBMAdcpNk7ZFHNOzv9tJVh_u69H9_W3-_rX_co6H1nBOffHUMAllop9AWeGfgI</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Huang, Peng-Yu</creator><creator>Cheng, Wan-Shu</creator><creator>Chen, Ju-Chin</creator><creator>Chung, Wen-Yu</creator><creator>Chen, Young-Lin</creator><creator>Lin, Kawuu W.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7126-8096</orcidid><orcidid>https://orcid.org/0000-0002-1669-1008</orcidid></search><sort><creationdate>2021</creationdate><title>A Distributed Method for Fast Mining Frequent Patterns From Big Data</title><author>Huang, Peng-Yu ; Cheng, Wan-Shu ; Chen, Ju-Chin ; Chung, Wen-Yu ; Chen, Young-Lin ; Lin, Kawuu W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-a5275b5fc51de8e52b6f7341a439e52fba1c475c0b6386589cc7835630f255a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Big Data</topic><topic>Computer networks</topic><topic>Costs</topic><topic>Data mining</topic><topic>Data structures</topic><topic>distributed computing</topic><topic>Distributed databases</topic><topic>Distributed processing</topic><topic>Energy limitation</topic><topic>Itemsets</topic><topic>Massive data points</topic><topic>Memory management</topic><topic>parallel algorithms</topic><topic>Pattern analysis</topic><topic>Performance enhancement</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Peng-Yu</creatorcontrib><creatorcontrib>Cheng, Wan-Shu</creatorcontrib><creatorcontrib>Chen, Ju-Chin</creatorcontrib><creatorcontrib>Chung, Wen-Yu</creatorcontrib><creatorcontrib>Chen, Young-Lin</creatorcontrib><creatorcontrib>Lin, Kawuu W.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Peng-Yu</au><au>Cheng, Wan-Shu</au><au>Chen, Ju-Chin</au><au>Chung, Wen-Yu</au><au>Chen, Young-Lin</au><au>Lin, Kawuu W.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Distributed Method for Fast Mining Frequent Patterns From Big Data</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>135144</spage><epage>135159</epage><pages>135144-135159</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3115514</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0001-7126-8096</orcidid><orcidid>https://orcid.org/0000-0002-1669-1008</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2021, Vol.9, p.135144-135159
issn	2169-3536 2169-3536
language	eng
recordid	cdi_ieee_primary_9548089
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects	Algorithms Artificial intelligence Big Data Computer networks Costs Data mining Data structures distributed computing Distributed databases Distributed processing Energy limitation Itemsets Massive data points Memory management parallel algorithms Pattern analysis Performance enhancement
title	A Distributed Method for Fast Mining Frequent Patterns From Big Data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T05%3A12%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Distributed%20Method%20for%20Fast%20Mining%20Frequent%20Patterns%20From%20Big%20Data&rft.jtitle=IEEE%20access&rft.au=Huang,%20Peng-Yu&rft.date=2021&rft.volume=9&rft.spage=135144&rft.epage=135159&rft.pages=135144-135159&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3115514&rft_dat=%3Cproquest_ieee_%3E2580100250%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2580100250&rft_id=info:pmid/&rft_ieee_id=9548089&rft_doaj_id=oai_doaj_org_article_14dac2691ed942c89126e46cb2bb3fcf&rfr_iscdi=true