Single-scan: a fast star-join query processing algorithm

Summary A data warehouse can store very large amounts of data that should be processed in parallel in order to achieve reasonable query execution times. The MapReduce programming model is a very convenient way to process large amounts of data in parallel on commodity hardware clusters. A very popula...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Software, practice & experience practice & experience, 2016-03, Vol.46 (3), p.319-339
Hauptverfasser: PurdilA, Vasile, Pentiuc, Stefan-Gheorghe
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 339
container_issue 3
container_start_page 319
container_title Software, practice & experience
container_volume 46
creator PurdilA, Vasile
Pentiuc, Stefan-Gheorghe
description Summary A data warehouse can store very large amounts of data that should be processed in parallel in order to achieve reasonable query execution times. The MapReduce programming model is a very convenient way to process large amounts of data in parallel on commodity hardware clusters. A very popular query used in data warehouses is star‐join. In this paper, we present a fast and efficient star‐join query execution algorithm built on top of a MapReduce framework called Hadoop. By using dynamic filters against dimension tables, the algorithm needs a single scan of the fact table, which means a significant reduction of input/output operations and computational complexity. Also, the algorithm requires only two MapReduce iterations in total–one to build the filters against dimension tables and one to scan the fact table. Our experiments show that the proposed algorithm performs much better than the existing solutions in terms of execution time and input/output. Copyright © 2014 John Wiley & Sons, Ltd.
doi_str_mv 10.1002/spe.2308
format Article
fullrecord <record><control><sourceid>proquest_wiley</sourceid><recordid>TN_cdi_proquest_journals_1761634085</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3939391421</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3688-a03adeca64f1b1188fed542db2842bfbd8c7f0876c87839ebe671f4e76dbe4563</originalsourceid><addsrcrecordid>eNpFkF1LwzAYhYMoOKfgTyh4nfnmo0nmnY7ZCUMHUyrehLRNZmfXzqRD--_tmOjVuXk45_AgdElgRADoddjaEWWgjtCAwFhioPz1GA0AmMIgOD9FZyGsAQiJqRggtSzrVWVxyE19E5nImdBGoTUer5uyjj531nfR1je5DaEnI1OtGl-275tzdOJMFezFbw7Ry_30eTLD86fkYXI7xzkTSmEDzBQ2N4I7khGilLNFzGmRUcVp5rJC5dKBkiJXUrGxzayQxHErRZFZHgs2RFeH3v5E_ya0et3sfN1PaiIFEYyDinsKH6ivsrKd3vpyY3ynCei9FN1L0XspermY7vOfL0Nrv_944z-0kEzGOn1M9CKdxeQuedMp-wGZDGU1</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1761634085</pqid></control><display><type>article</type><title>Single-scan: a fast star-join query processing algorithm</title><source>Access via Wiley Online Library</source><creator>PurdilA, Vasile ; Pentiuc, Stefan-Gheorghe</creator><creatorcontrib>PurdilA, Vasile ; Pentiuc, Stefan-Gheorghe</creatorcontrib><description>Summary A data warehouse can store very large amounts of data that should be processed in parallel in order to achieve reasonable query execution times. The MapReduce programming model is a very convenient way to process large amounts of data in parallel on commodity hardware clusters. A very popular query used in data warehouses is star‐join. In this paper, we present a fast and efficient star‐join query execution algorithm built on top of a MapReduce framework called Hadoop. By using dynamic filters against dimension tables, the algorithm needs a single scan of the fact table, which means a significant reduction of input/output operations and computational complexity. Also, the algorithm requires only two MapReduce iterations in total–one to build the filters against dimension tables and one to scan the fact table. Our experiments show that the proposed algorithm performs much better than the existing solutions in terms of execution time and input/output. Copyright © 2014 John Wiley &amp; Sons, Ltd.</description><identifier>ISSN: 0038-0644</identifier><identifier>EISSN: 1097-024X</identifier><identifier>DOI: 10.1002/spe.2308</identifier><language>eng</language><publisher>Bognor Regis: Blackwell Publishing Ltd</publisher><subject>algorithm ; Bloom filter ; data warehouse ; dimension table ; fact table ; Hadoop ; MapReduce ; parallel processing ; star-join</subject><ispartof>Software, practice &amp; experience, 2016-03, Vol.46 (3), p.319-339</ispartof><rights>Copyright © 2015 John Wiley &amp; Sons, Ltd.</rights><rights>Copyright © 2016 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3688-a03adeca64f1b1188fed542db2842bfbd8c7f0876c87839ebe671f4e76dbe4563</citedby><orcidid>0000-0002-5239-9493</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fspe.2308$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fspe.2308$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids></links><search><creatorcontrib>PurdilA, Vasile</creatorcontrib><creatorcontrib>Pentiuc, Stefan-Gheorghe</creatorcontrib><title>Single-scan: a fast star-join query processing algorithm</title><title>Software, practice &amp; experience</title><addtitle>Softw. Pract. Exper</addtitle><description>Summary A data warehouse can store very large amounts of data that should be processed in parallel in order to achieve reasonable query execution times. The MapReduce programming model is a very convenient way to process large amounts of data in parallel on commodity hardware clusters. A very popular query used in data warehouses is star‐join. In this paper, we present a fast and efficient star‐join query execution algorithm built on top of a MapReduce framework called Hadoop. By using dynamic filters against dimension tables, the algorithm needs a single scan of the fact table, which means a significant reduction of input/output operations and computational complexity. Also, the algorithm requires only two MapReduce iterations in total–one to build the filters against dimension tables and one to scan the fact table. Our experiments show that the proposed algorithm performs much better than the existing solutions in terms of execution time and input/output. Copyright © 2014 John Wiley &amp; Sons, Ltd.</description><subject>algorithm</subject><subject>Bloom filter</subject><subject>data warehouse</subject><subject>dimension table</subject><subject>fact table</subject><subject>Hadoop</subject><subject>MapReduce</subject><subject>parallel processing</subject><subject>star-join</subject><issn>0038-0644</issn><issn>1097-024X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNpFkF1LwzAYhYMoOKfgTyh4nfnmo0nmnY7ZCUMHUyrehLRNZmfXzqRD--_tmOjVuXk45_AgdElgRADoddjaEWWgjtCAwFhioPz1GA0AmMIgOD9FZyGsAQiJqRggtSzrVWVxyE19E5nImdBGoTUer5uyjj531nfR1je5DaEnI1OtGl-275tzdOJMFezFbw7Ry_30eTLD86fkYXI7xzkTSmEDzBQ2N4I7khGilLNFzGmRUcVp5rJC5dKBkiJXUrGxzayQxHErRZFZHgs2RFeH3v5E_ya0et3sfN1PaiIFEYyDinsKH6ivsrKd3vpyY3ynCei9FN1L0XspermY7vOfL0Nrv_944z-0kEzGOn1M9CKdxeQuedMp-wGZDGU1</recordid><startdate>201603</startdate><enddate>201603</enddate><creator>PurdilA, Vasile</creator><creator>Pentiuc, Stefan-Gheorghe</creator><general>Blackwell Publishing Ltd</general><general>Wiley Subscription Services, Inc</general><scope>BSCLL</scope><scope>7SC</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5239-9493</orcidid></search><sort><creationdate>201603</creationdate><title>Single-scan: a fast star-join query processing algorithm</title><author>PurdilA, Vasile ; Pentiuc, Stefan-Gheorghe</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3688-a03adeca64f1b1188fed542db2842bfbd8c7f0876c87839ebe671f4e76dbe4563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>algorithm</topic><topic>Bloom filter</topic><topic>data warehouse</topic><topic>dimension table</topic><topic>fact table</topic><topic>Hadoop</topic><topic>MapReduce</topic><topic>parallel processing</topic><topic>star-join</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>PurdilA, Vasile</creatorcontrib><creatorcontrib>Pentiuc, Stefan-Gheorghe</creatorcontrib><collection>Istex</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Software, practice &amp; experience</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>PurdilA, Vasile</au><au>Pentiuc, Stefan-Gheorghe</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Single-scan: a fast star-join query processing algorithm</atitle><jtitle>Software, practice &amp; experience</jtitle><addtitle>Softw. Pract. Exper</addtitle><date>2016-03</date><risdate>2016</risdate><volume>46</volume><issue>3</issue><spage>319</spage><epage>339</epage><pages>319-339</pages><issn>0038-0644</issn><eissn>1097-024X</eissn><abstract>Summary A data warehouse can store very large amounts of data that should be processed in parallel in order to achieve reasonable query execution times. The MapReduce programming model is a very convenient way to process large amounts of data in parallel on commodity hardware clusters. A very popular query used in data warehouses is star‐join. In this paper, we present a fast and efficient star‐join query execution algorithm built on top of a MapReduce framework called Hadoop. By using dynamic filters against dimension tables, the algorithm needs a single scan of the fact table, which means a significant reduction of input/output operations and computational complexity. Also, the algorithm requires only two MapReduce iterations in total–one to build the filters against dimension tables and one to scan the fact table. Our experiments show that the proposed algorithm performs much better than the existing solutions in terms of execution time and input/output. Copyright © 2014 John Wiley &amp; Sons, Ltd.</abstract><cop>Bognor Regis</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1002/spe.2308</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0002-5239-9493</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0038-0644
ispartof Software, practice & experience, 2016-03, Vol.46 (3), p.319-339
issn 0038-0644
1097-024X
language eng
recordid cdi_proquest_journals_1761634085
source Access via Wiley Online Library
subjects algorithm
Bloom filter
data warehouse
dimension table
fact table
Hadoop
MapReduce
parallel processing
star-join
title Single-scan: a fast star-join query processing algorithm
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T04%3A40%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_wiley&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Single-scan:%20a%20fast%20star-join%20query%20processing%20algorithm&rft.jtitle=Software,%20practice%20&%20experience&rft.au=PurdilA,%20Vasile&rft.date=2016-03&rft.volume=46&rft.issue=3&rft.spage=319&rft.epage=339&rft.pages=319-339&rft.issn=0038-0644&rft.eissn=1097-024X&rft_id=info:doi/10.1002/spe.2308&rft_dat=%3Cproquest_wiley%3E3939391421%3C/proquest_wiley%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1761634085&rft_id=info:pmid/&rfr_iscdi=true