Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine

Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled se...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Madsen, Kasper Grud Skat, Zhou, Yongluan, Cao, Jianneng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Madsen, Kasper Grud Skat
Zhou, Yongluan
Cao, Jianneng
description Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches.
doi_str_mv 10.48550/arxiv.1602.03770
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1602_03770</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1602_03770</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-6a764311be376077c3067e1d145cc2a4ef88fb1acbb7291637d8f5efb42d911f3</originalsourceid><addsrcrecordid>eNotz0FOwzAQhWFvWKCWA7DCF0jwxI4nXValQFElKug-mjjjyFLiICdU9PaohdWT_sWTPiHuQeWmKkv1SOknnHKwqsiVRlS34m0XZ-4SzeHE8ukcaQhOfrAbow_d96WPUYYoSR4oUd9zLz_nxDTIQxodT1OIndzGLkReihtP_cR3_7sQx-ftcfOa7d9fdpv1PiOLKrOE1miAhjVahei0ssjQgimdK8iwryrfALmmwWIFVmNb-ZJ9Y4p2BeD1Qjz83V4x9VcKA6VzfUHVV5T-BVVAR30</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine</title><source>arXiv.org</source><creator>Madsen, Kasper Grud Skat ; Zhou, Yongluan ; Cao, Jianneng</creator><creatorcontrib>Madsen, Kasper Grud Skat ; Zhou, Yongluan ; Cao, Jianneng</creatorcontrib><description>Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches.</description><identifier>DOI: 10.48550/arxiv.1602.03770</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><creationdate>2016-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1602.03770$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1602.03770$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Madsen, Kasper Grud Skat</creatorcontrib><creatorcontrib>Zhou, Yongluan</creatorcontrib><creatorcontrib>Cao, Jianneng</creatorcontrib><title>Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine</title><description>Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz0FOwzAQhWFvWKCWA7DCF0jwxI4nXValQFElKug-mjjjyFLiICdU9PaohdWT_sWTPiHuQeWmKkv1SOknnHKwqsiVRlS34m0XZ-4SzeHE8ukcaQhOfrAbow_d96WPUYYoSR4oUd9zLz_nxDTIQxodT1OIndzGLkReihtP_cR3_7sQx-ftcfOa7d9fdpv1PiOLKrOE1miAhjVahei0ssjQgimdK8iwryrfALmmwWIFVmNb-ZJ9Y4p2BeD1Qjz83V4x9VcKA6VzfUHVV5T-BVVAR30</recordid><startdate>20160211</startdate><enddate>20160211</enddate><creator>Madsen, Kasper Grud Skat</creator><creator>Zhou, Yongluan</creator><creator>Cao, Jianneng</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20160211</creationdate><title>Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine</title><author>Madsen, Kasper Grud Skat ; Zhou, Yongluan ; Cao, Jianneng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-6a764311be376077c3067e1d145cc2a4ef88fb1acbb7291637d8f5efb42d911f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Madsen, Kasper Grud Skat</creatorcontrib><creatorcontrib>Zhou, Yongluan</creatorcontrib><creatorcontrib>Cao, Jianneng</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Madsen, Kasper Grud Skat</au><au>Zhou, Yongluan</au><au>Cao, Jianneng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine</atitle><date>2016-02-11</date><risdate>2016</risdate><abstract>Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches.</abstract><doi>10.48550/arxiv.1602.03770</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1602.03770
ispartof
issn
language eng
recordid cdi_arxiv_primary_1602_03770
source arXiv.org
subjects Computer Science - Distributed, Parallel, and Cluster Computing
title Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T07%3A00%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Integrative%20Dynamic%20Reconfiguration%20in%20a%20Parallel%20Stream%20Processing%20Engine&rft.au=Madsen,%20Kasper%20Grud%20Skat&rft.date=2016-02-11&rft_id=info:doi/10.48550/arxiv.1602.03770&rft_dat=%3Carxiv_GOX%3E1602_03770%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true