Analysis of frequent itemset generation based on trie data structure in Apriori algorithm
Apriori is one technique of data mining association rules that aims to extract correlations between sets of items in the transaction database. The main problem with the Apriori algorithm is the process of scanning databases repeatedly to generate itemset candidates. This research examines the combin...
Gespeichert in:
Veröffentlicht in: | Telkomnika 2021-10, Vol.19 (5), p.1553-1564 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1564 |
---|---|
container_issue | 5 |
container_start_page | 1553 |
container_title | Telkomnika |
container_volume | 19 |
creator | Hodijah, Ade Setijohatmo, Urip Teguh |
description | Apriori is one technique of data mining association rules that aims to extract correlations between sets of items in the transaction database. The main problem with the Apriori algorithm is the process of scanning databases repeatedly to generate itemset candidates. This research examines the combination of pruning by using the trieapproach and multi-thread implementation in three algorithms to obtain frequent itemset. Trie is a data structure in the form of an ordered tree to store a set of strings where every node in the tree contains the same prefix. The use of a full combination trie (different from frequent pattern (FP) tree using links) allows the implementation of arrays and the hash calculation to achieve the addressing of itemset combination. In this research, the measure to get the address is called Hash-node calculation used to update support value. For these three alternatives, run time processing is analyzed based on the number of itemset combinations and transaction data at a certain minimum support value. The experimental results show that an algorithm thatexploits resource capabilities by applying multi-threadperforms almost seven times betterthanan algorithm implemented in single-thread in calculating hash-node. The fastest run time of the multi-thread approach is 43 minutes with 150-itemset combinations on 100,000 transaction data. |
doi_str_mv | 10.12928/telkomnika.v19i5.19273 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2582833422</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2582833422</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1953-45839e7b22c90e40f12c9938dc567e3099336b42101e27b6c0734dc3258dac5f3</originalsourceid><addsrcrecordid>eNpFkM1LAzEQxYMoWLR_gwHPW5OZ_cqxFL-g4EUPnkI2O1vTdndrkhX63xtawYHhvcNjePNj7E6KhQQF9UOk_W7sB7czix-pXLGQCiq8YDNAAZkChZdsJkuFWVpxzeYhbEWaSkCh6hn7XA5mfwwu8LHjnafviYbIXaQ-UOQbGsib6MaBNyZQy5OJ3hFvTTQ8RD_ZOHnibuDLg3ejd9zsN0niV3_LrjqzDzT_0xv28fT4vnrJ1m_Pr6vlOrNSFZjlRY2KqgbAKkG56GQyCuvWFmVFKJLHsslBCklQNaUVFeatRSjq1tiiwxt2f7578GNqH6LejpNPXwWdMlAj5gApVZ1T1o8heOp06tsbf9RS6BNK_Y9Sn1DqE0r8BX1TayY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2582833422</pqid></control><display><type>article</type><title>Analysis of frequent itemset generation based on trie data structure in Apriori algorithm</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>Hodijah, Ade ; Setijohatmo, Urip Teguh</creator><creatorcontrib>Hodijah, Ade ; Setijohatmo, Urip Teguh</creatorcontrib><description>Apriori is one technique of data mining association rules that aims to extract correlations between sets of items in the transaction database. The main problem with the Apriori algorithm is the process of scanning databases repeatedly to generate itemset candidates. This research examines the combination of pruning by using the trieapproach and multi-thread implementation in three algorithms to obtain frequent itemset. Trie is a data structure in the form of an ordered tree to store a set of strings where every node in the tree contains the same prefix. The use of a full combination trie (different from frequent pattern (FP) tree using links) allows the implementation of arrays and the hash calculation to achieve the addressing of itemset combination. In this research, the measure to get the address is called Hash-node calculation used to update support value. For these three alternatives, run time processing is analyzed based on the number of itemset combinations and transaction data at a certain minimum support value. The experimental results show that an algorithm thatexploits resource capabilities by applying multi-threadperforms almost seven times betterthanan algorithm implemented in single-thread in calculating hash-node. The fastest run time of the multi-thread approach is 43 minutes with 150-itemset combinations on 100,000 transaction data.</description><identifier>ISSN: 1693-6930</identifier><identifier>EISSN: 2302-9293</identifier><identifier>DOI: 10.12928/telkomnika.v19i5.19273</identifier><language>eng</language><publisher>Yogyakarta: Ahmad Dahlan University</publisher><subject>Algorithms ; Candidates ; Data mining ; Data structures ; Experiments ; Mathematical analysis ; Nodes ; Run time (computers)</subject><ispartof>Telkomnika, 2021-10, Vol.19 (5), p.1553-1564</ispartof><rights>2021. This work is published under https://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Hodijah, Ade</creatorcontrib><creatorcontrib>Setijohatmo, Urip Teguh</creatorcontrib><title>Analysis of frequent itemset generation based on trie data structure in Apriori algorithm</title><title>Telkomnika</title><description>Apriori is one technique of data mining association rules that aims to extract correlations between sets of items in the transaction database. The main problem with the Apriori algorithm is the process of scanning databases repeatedly to generate itemset candidates. This research examines the combination of pruning by using the trieapproach and multi-thread implementation in three algorithms to obtain frequent itemset. Trie is a data structure in the form of an ordered tree to store a set of strings where every node in the tree contains the same prefix. The use of a full combination trie (different from frequent pattern (FP) tree using links) allows the implementation of arrays and the hash calculation to achieve the addressing of itemset combination. In this research, the measure to get the address is called Hash-node calculation used to update support value. For these three alternatives, run time processing is analyzed based on the number of itemset combinations and transaction data at a certain minimum support value. The experimental results show that an algorithm thatexploits resource capabilities by applying multi-threadperforms almost seven times betterthanan algorithm implemented in single-thread in calculating hash-node. The fastest run time of the multi-thread approach is 43 minutes with 150-itemset combinations on 100,000 transaction data.</description><subject>Algorithms</subject><subject>Candidates</subject><subject>Data mining</subject><subject>Data structures</subject><subject>Experiments</subject><subject>Mathematical analysis</subject><subject>Nodes</subject><subject>Run time (computers)</subject><issn>1693-6930</issn><issn>2302-9293</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpFkM1LAzEQxYMoWLR_gwHPW5OZ_cqxFL-g4EUPnkI2O1vTdndrkhX63xtawYHhvcNjePNj7E6KhQQF9UOk_W7sB7czix-pXLGQCiq8YDNAAZkChZdsJkuFWVpxzeYhbEWaSkCh6hn7XA5mfwwu8LHjnafviYbIXaQ-UOQbGsib6MaBNyZQy5OJ3hFvTTQ8RD_ZOHnibuDLg3ejd9zsN0niV3_LrjqzDzT_0xv28fT4vnrJ1m_Pr6vlOrNSFZjlRY2KqgbAKkG56GQyCuvWFmVFKJLHsslBCklQNaUVFeatRSjq1tiiwxt2f7578GNqH6LejpNPXwWdMlAj5gApVZ1T1o8heOp06tsbf9RS6BNK_Y9Sn1DqE0r8BX1TayY</recordid><startdate>20211001</startdate><enddate>20211001</enddate><creator>Hodijah, Ade</creator><creator>Setijohatmo, Urip Teguh</creator><general>Ahmad Dahlan University</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BVBZV</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20211001</creationdate><title>Analysis of frequent itemset generation based on trie data structure in Apriori algorithm</title><author>Hodijah, Ade ; Setijohatmo, Urip Teguh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1953-45839e7b22c90e40f12c9938dc567e3099336b42101e27b6c0734dc3258dac5f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Candidates</topic><topic>Data mining</topic><topic>Data structures</topic><topic>Experiments</topic><topic>Mathematical analysis</topic><topic>Nodes</topic><topic>Run time (computers)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hodijah, Ade</creatorcontrib><creatorcontrib>Setijohatmo, Urip Teguh</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>East & South Asia Database</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Telkomnika</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hodijah, Ade</au><au>Setijohatmo, Urip Teguh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Analysis of frequent itemset generation based on trie data structure in Apriori algorithm</atitle><jtitle>Telkomnika</jtitle><date>2021-10-01</date><risdate>2021</risdate><volume>19</volume><issue>5</issue><spage>1553</spage><epage>1564</epage><pages>1553-1564</pages><issn>1693-6930</issn><eissn>2302-9293</eissn><abstract>Apriori is one technique of data mining association rules that aims to extract correlations between sets of items in the transaction database. The main problem with the Apriori algorithm is the process of scanning databases repeatedly to generate itemset candidates. This research examines the combination of pruning by using the trieapproach and multi-thread implementation in three algorithms to obtain frequent itemset. Trie is a data structure in the form of an ordered tree to store a set of strings where every node in the tree contains the same prefix. The use of a full combination trie (different from frequent pattern (FP) tree using links) allows the implementation of arrays and the hash calculation to achieve the addressing of itemset combination. In this research, the measure to get the address is called Hash-node calculation used to update support value. For these three alternatives, run time processing is analyzed based on the number of itemset combinations and transaction data at a certain minimum support value. The experimental results show that an algorithm thatexploits resource capabilities by applying multi-threadperforms almost seven times betterthanan algorithm implemented in single-thread in calculating hash-node. The fastest run time of the multi-thread approach is 43 minutes with 150-itemset combinations on 100,000 transaction data.</abstract><cop>Yogyakarta</cop><pub>Ahmad Dahlan University</pub><doi>10.12928/telkomnika.v19i5.19273</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1693-6930 |
ispartof | Telkomnika, 2021-10, Vol.19 (5), p.1553-1564 |
issn | 1693-6930 2302-9293 |
language | eng |
recordid | cdi_proquest_journals_2582833422 |
source | EZB-FREE-00999 freely available EZB journals |
subjects | Algorithms Candidates Data mining Data structures Experiments Mathematical analysis Nodes Run time (computers) |
title | Analysis of frequent itemset generation based on trie data structure in Apriori algorithm |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T01%3A30%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Analysis%20of%20frequent%20itemset%20generation%20based%20on%20trie%20data%20structure%20in%20Apriori%20algorithm&rft.jtitle=Telkomnika&rft.au=Hodijah,%20Ade&rft.date=2021-10-01&rft.volume=19&rft.issue=5&rft.spage=1553&rft.epage=1564&rft.pages=1553-1564&rft.issn=1693-6930&rft.eissn=2302-9293&rft_id=info:doi/10.12928/telkomnika.v19i5.19273&rft_dat=%3Cproquest_cross%3E2582833422%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2582833422&rft_id=info:pmid/&rfr_iscdi=true |