Scalable and Adaptive Online Joins

Scalable join processing in a parallel shared-nothing environment requires a partitioning policy that evenly distributes the processing load while minimizing the size of state maintained and number of messages communicated. Previous research proposes static partitioning schemes that require statisti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ElSeidy, Mohammed, Elguindy, Abdallah, Vitorovic, Aleksandar, Koch, Christoph
Format: Web Resource
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator ElSeidy, Mohammed
Elguindy, Abdallah
Vitorovic, Aleksandar
Koch, Christoph
description Scalable join processing in a parallel shared-nothing environment requires a partitioning policy that evenly distributes the processing load while minimizing the size of state maintained and number of messages communicated. Previous research proposes static partitioning schemes that require statistics beforehand. In an online or streaming environment in which no statistics about the workload are known, traditional static approaches perform poorly. This paper presents a novel parallel online dataflow join operator that supports arbitrary join predicates. The proposed operator continuously adjusts itself to the data dynamics through adaptive dataflow routing and state repartitioning. The operator is resilient to data skew, maintains high throughput rates, avoids blocking behavior during state repartitioning, takes an eventual consistency approach for maintaining its local state, and behaves strongly consistently as a black-box dataflow operator. We prove that the operator ensures a constant competitive ratio 3.75 in data distribution optimality and that the cost of processing an input tuple is amortized constant, taking into account adaptivity costs. Our evaluation demonstrates that our operator outperforms the state-of-the-art static partitioning schemes in resource utilization, throughput, and execution time.
format Web Resource
fullrecord <record><control><sourceid>epfl_F1K</sourceid><recordid>TN_cdi_epfl_infoscience_oai_infoscience_tind_io_190035</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_infoscience_tind_io_190035</sourcerecordid><originalsourceid>FETCH-epfl_infoscience_oai_infoscience_tind_io_1900353</originalsourceid><addsrcrecordid>eNrjZFAKTk7MSUzKSVVIzEtRcExJLCjJLEtV8M_LycxLVfDKz8wr5mFgTUvMKU7lhdLcDGZuriHOHrqpBWk58Zl5afnFyZmpecmp8fmJmSj8ksy8lPjM_HhDSwMDY1NjsjUCAB5BOJk</addsrcrecordid><sourcetype>Institutional Repository</sourcetype><iscdi>true</iscdi><recordtype>web_resource</recordtype></control><display><type>web_resource</type><title>Scalable and Adaptive Online Joins</title><source>Infoscience: EPF Lausanne</source><creator>ElSeidy, Mohammed ; Elguindy, Abdallah ; Vitorovic, Aleksandar ; Koch, Christoph</creator><creatorcontrib>ElSeidy, Mohammed ; Elguindy, Abdallah ; Vitorovic, Aleksandar ; Koch, Christoph</creatorcontrib><description>Scalable join processing in a parallel shared-nothing environment requires a partitioning policy that evenly distributes the processing load while minimizing the size of state maintained and number of messages communicated. Previous research proposes static partitioning schemes that require statistics beforehand. In an online or streaming environment in which no statistics about the workload are known, traditional static approaches perform poorly. This paper presents a novel parallel online dataflow join operator that supports arbitrary join predicates. The proposed operator continuously adjusts itself to the data dynamics through adaptive dataflow routing and state repartitioning. The operator is resilient to data skew, maintains high throughput rates, avoids blocking behavior during state repartitioning, takes an eventual consistency approach for maintaining its local state, and behaves strongly consistently as a black-box dataflow operator. We prove that the operator ensures a constant competitive ratio 3.75 in data distribution optimality and that the cost of processing an input tuple is amortized constant, taking into account adaptivity costs. Our evaluation demonstrates that our operator outperforms the state-of-the-art static partitioning schemes in resource utilization, throughput, and execution time.</description><language>eng</language><publisher>Hangzhou, China, VLDB</publisher><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,776,27837</link.rule.ids><linktorsrc>$$Uhttp://infoscience.epfl.ch/record/190035$$EView_record_in_EPF_Lausanne$$FView_record_in_$$GEPF_Lausanne$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>ElSeidy, Mohammed</creatorcontrib><creatorcontrib>Elguindy, Abdallah</creatorcontrib><creatorcontrib>Vitorovic, Aleksandar</creatorcontrib><creatorcontrib>Koch, Christoph</creatorcontrib><title>Scalable and Adaptive Online Joins</title><description>Scalable join processing in a parallel shared-nothing environment requires a partitioning policy that evenly distributes the processing load while minimizing the size of state maintained and number of messages communicated. Previous research proposes static partitioning schemes that require statistics beforehand. In an online or streaming environment in which no statistics about the workload are known, traditional static approaches perform poorly. This paper presents a novel parallel online dataflow join operator that supports arbitrary join predicates. The proposed operator continuously adjusts itself to the data dynamics through adaptive dataflow routing and state repartitioning. The operator is resilient to data skew, maintains high throughput rates, avoids blocking behavior during state repartitioning, takes an eventual consistency approach for maintaining its local state, and behaves strongly consistently as a black-box dataflow operator. We prove that the operator ensures a constant competitive ratio 3.75 in data distribution optimality and that the cost of processing an input tuple is amortized constant, taking into account adaptivity costs. Our evaluation demonstrates that our operator outperforms the state-of-the-art static partitioning schemes in resource utilization, throughput, and execution time.</description><fulltext>true</fulltext><rsrctype>web_resource</rsrctype><recordtype>web_resource</recordtype><sourceid>F1K</sourceid><recordid>eNrjZFAKTk7MSUzKSVVIzEtRcExJLCjJLEtV8M_LycxLVfDKz8wr5mFgTUvMKU7lhdLcDGZuriHOHrqpBWk58Zl5afnFyZmpecmp8fmJmSj8ksy8lPjM_HhDSwMDY1NjsjUCAB5BOJk</recordid><creator>ElSeidy, Mohammed</creator><creator>Elguindy, Abdallah</creator><creator>Vitorovic, Aleksandar</creator><creator>Koch, Christoph</creator><general>Hangzhou, China, VLDB</general><scope>F1K</scope></search><sort><title>Scalable and Adaptive Online Joins</title><author>ElSeidy, Mohammed ; Elguindy, Abdallah ; Vitorovic, Aleksandar ; Koch, Christoph</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epfl_infoscience_oai_infoscience_tind_io_1900353</frbrgroupid><rsrctype>web_resources</rsrctype><prefilter>web_resources</prefilter><language>eng</language><toplevel>online_resources</toplevel><creatorcontrib>ElSeidy, Mohammed</creatorcontrib><creatorcontrib>Elguindy, Abdallah</creatorcontrib><creatorcontrib>Vitorovic, Aleksandar</creatorcontrib><creatorcontrib>Koch, Christoph</creatorcontrib><collection>Infoscience: EPF Lausanne</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>ElSeidy, Mohammed</au><au>Elguindy, Abdallah</au><au>Vitorovic, Aleksandar</au><au>Koch, Christoph</au><format>book</format><genre>unknown</genre><ristype>GEN</ristype><btitle>Scalable and Adaptive Online Joins</btitle><abstract>Scalable join processing in a parallel shared-nothing environment requires a partitioning policy that evenly distributes the processing load while minimizing the size of state maintained and number of messages communicated. Previous research proposes static partitioning schemes that require statistics beforehand. In an online or streaming environment in which no statistics about the workload are known, traditional static approaches perform poorly. This paper presents a novel parallel online dataflow join operator that supports arbitrary join predicates. The proposed operator continuously adjusts itself to the data dynamics through adaptive dataflow routing and state repartitioning. The operator is resilient to data skew, maintains high throughput rates, avoids blocking behavior during state repartitioning, takes an eventual consistency approach for maintaining its local state, and behaves strongly consistently as a black-box dataflow operator. We prove that the operator ensures a constant competitive ratio 3.75 in data distribution optimality and that the cost of processing an input tuple is amortized constant, taking into account adaptivity costs. Our evaluation demonstrates that our operator outperforms the state-of-the-art static partitioning schemes in resource utilization, throughput, and execution time.</abstract><pub>Hangzhou, China, VLDB</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epfl_infoscience_oai_infoscience_tind_io_190035
source Infoscience: EPF Lausanne
title Scalable and Adaptive Online Joins
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T06%3A47%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epfl_F1K&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.btitle=Scalable%20and%20Adaptive%20Online%20Joins&rft.au=ElSeidy,%20Mohammed&rft_id=info:doi/&rft_dat=%3Cepfl_F1K%3Eoai_infoscience_tind_io_190035%3C/epfl_F1K%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true