Efficient and Adaptive Stateful Replication for Stream Processing Engines in High-Availability Cluster

Stateful stream process engines in high availability clusters (HACs) track a large number of concurrent flow states and replicate them to backups to provide reliable functionality. Under high traffic loads, existing solutions in such HACs are expensive owing to precise stateful replication. This wor...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2011-11, Vol.22 (11), p.1788-1796
Hauptverfasser:	Feng, Yi-Hsuan, Huang, Nen-Fu, Wu, Yen-Min
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive estimation adaptive method Adaptive systems bloom filters Clustering methods Clusters Dynamical systems Dynamics Engines Filters high availability Insertion Multiple hash functions Random processes Replication Streams
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1796
container_issue	11
container_start_page	1788
container_title	IEEE transactions on parallel and distributed systems
container_volume	22
creator	Feng, Yi-Hsuan Huang, Nen-Fu Wu, Yen-Min
description	Stateful stream process engines in high availability clusters (HACs) track a large number of concurrent flow states and replicate them to backups to provide reliable functionality. Under high traffic loads, existing solutions in such HACs are expensive owing to precise stateful replication. This work presents two novel methods to address this issue: randomization on replication representation and a replication scheme designed for when system becomes overloaded. A hashing structure called Multilevel Counting Bloom Filter (MLCBF) is proposed as a low resource-consuming solution of stateful replication. Its performance and tradeoffs are then evaluated based on theoretic analysis and extensive trace-based tests. Trace-based simulation reveals that MLCBF reduces network and memory requirements of replication typically by over 90 percent for URL categorization. Most importantly, MLCBF is quite as simple and practical for implementation and maintenance. Moreover, an adaptive scheme called dynamic lazy insertion is designed to prevent replication from overloading system continuously and optimize the throughput of HAC. Testbed evaluation demonstrates its feasibility and effectiveness in an overloaded HAC.
doi_str_mv	10.1109/TPDS.2011.83
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPDS_2011_83</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5733339</ieee_id><sourcerecordid>1709771871</sourcerecordid><originalsourceid>FETCH-LOGICAL-c347t-72996a7488fc709d2f6f8c5760e898575b0949e45bf0c9a29a5384e2257255403</originalsourceid><addsrcrecordid>eNp90U1rGzEQBuClNNDU6S23XkQv6aHr6HMlHY3jJoFATJKehSyPXIW11pW0hvz7ynHoIYfORQN6mGF4m-ac4CkhWF8-La8epxQTMlXsQ3NKhFAtJYp9rD3motWU6E_N55yfMSZcYH7a-IX3wQWIBdm4RrO13ZWwB_RYbAE_9ugBdn1wtoQhIj-k-pHAbtEyDQ5yDnGDFnETImQUIroJm9_tbG9Db1ehD-UFzfsxF0hnzYm3fYYvb--k-fVz8TS_ae_ur2_ns7vWMS5LK6nWnZVcKe8k1mvqO6-ckB0GpZWQYoU118DFymOnLdVWMMWBUiGpEByzSXNxnLtLw58RcjHbkB30vY0wjNlo2jHKpWJVfv-vJHW_lERJUum3d_R5GFOsdxhNGKkjcVfRjyNyacg5gTe7FLY2vRiCzSEdc0jHHNIxr-u_HnkAgH9USFZLs79EBolR</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>913126306</pqid></control><display><type>article</type><title>Efficient and Adaptive Stateful Replication for Stream Processing Engines in High-Availability Cluster</title><source>IEEE Electronic Library (IEL)</source><creator>Feng, Yi-Hsuan ; Huang, Nen-Fu ; Wu, Yen-Min</creator><creatorcontrib>Feng, Yi-Hsuan ; Huang, Nen-Fu ; Wu, Yen-Min</creatorcontrib><description>Stateful stream process engines in high availability clusters (HACs) track a large number of concurrent flow states and replicate them to backups to provide reliable functionality. Under high traffic loads, existing solutions in such HACs are expensive owing to precise stateful replication. This work presents two novel methods to address this issue: randomization on replication representation and a replication scheme designed for when system becomes overloaded. A hashing structure called Multilevel Counting Bloom Filter (MLCBF) is proposed as a low resource-consuming solution of stateful replication. Its performance and tradeoffs are then evaluated based on theoretic analysis and extensive trace-based tests. Trace-based simulation reveals that MLCBF reduces network and memory requirements of replication typically by over 90 percent for URL categorization. Most importantly, MLCBF is quite as simple and practical for implementation and maintenance. Moreover, an adaptive scheme called dynamic lazy insertion is designed to prevent replication from overloading system continuously and optimize the throughput of HAC. Testbed evaluation demonstrates its feasibility and effectiveness in an overloaded HAC.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2011.83</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Adaptive estimation ; adaptive method ; Adaptive systems ; bloom filters ; Clustering methods ; Clusters ; Dynamical systems ; Dynamics ; Engines ; Filters ; high availability ; Insertion ; Multiple hash functions ; Random processes ; Replication ; Streams</subject><ispartof>IEEE transactions on parallel and distributed systems, 2011-11, Vol.22 (11), p.1788-1796</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Nov 2011</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c347t-72996a7488fc709d2f6f8c5760e898575b0949e45bf0c9a29a5384e2257255403</citedby><cites>FETCH-LOGICAL-c347t-72996a7488fc709d2f6f8c5760e898575b0949e45bf0c9a29a5384e2257255403</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5733339$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5733339$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Feng, Yi-Hsuan</creatorcontrib><creatorcontrib>Huang, Nen-Fu</creatorcontrib><creatorcontrib>Wu, Yen-Min</creatorcontrib><title>Efficient and Adaptive Stateful Replication for Stream Processing Engines in High-Availability Cluster</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Stateful stream process engines in high availability clusters (HACs) track a large number of concurrent flow states and replicate them to backups to provide reliable functionality. Under high traffic loads, existing solutions in such HACs are expensive owing to precise stateful replication. This work presents two novel methods to address this issue: randomization on replication representation and a replication scheme designed for when system becomes overloaded. A hashing structure called Multilevel Counting Bloom Filter (MLCBF) is proposed as a low resource-consuming solution of stateful replication. Its performance and tradeoffs are then evaluated based on theoretic analysis and extensive trace-based tests. Trace-based simulation reveals that MLCBF reduces network and memory requirements of replication typically by over 90 percent for URL categorization. Most importantly, MLCBF is quite as simple and practical for implementation and maintenance. Moreover, an adaptive scheme called dynamic lazy insertion is designed to prevent replication from overloading system continuously and optimize the throughput of HAC. Testbed evaluation demonstrates its feasibility and effectiveness in an overloaded HAC.</description><subject>Adaptive estimation</subject><subject>adaptive method</subject><subject>Adaptive systems</subject><subject>bloom filters</subject><subject>Clustering methods</subject><subject>Clusters</subject><subject>Dynamical systems</subject><subject>Dynamics</subject><subject>Engines</subject><subject>Filters</subject><subject>high availability</subject><subject>Insertion</subject><subject>Multiple hash functions</subject><subject>Random processes</subject><subject>Replication</subject><subject>Streams</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNp90U1rGzEQBuClNNDU6S23XkQv6aHr6HMlHY3jJoFATJKehSyPXIW11pW0hvz7ynHoIYfORQN6mGF4m-ac4CkhWF8-La8epxQTMlXsQ3NKhFAtJYp9rD3motWU6E_N55yfMSZcYH7a-IX3wQWIBdm4RrO13ZWwB_RYbAE_9ugBdn1wtoQhIj-k-pHAbtEyDQ5yDnGDFnETImQUIroJm9_tbG9Db1ehD-UFzfsxF0hnzYm3fYYvb--k-fVz8TS_ae_ur2_ns7vWMS5LK6nWnZVcKe8k1mvqO6-ckB0GpZWQYoU118DFymOnLdVWMMWBUiGpEByzSXNxnLtLw58RcjHbkB30vY0wjNlo2jHKpWJVfv-vJHW_lERJUum3d_R5GFOsdxhNGKkjcVfRjyNyacg5gTe7FLY2vRiCzSEdc0jHHNIxr-u_HnkAgH9USFZLs79EBolR</recordid><startdate>20111101</startdate><enddate>20111101</enddate><creator>Feng, Yi-Hsuan</creator><creator>Huang, Nen-Fu</creator><creator>Wu, Yen-Min</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope><scope>7QH</scope><scope>7UA</scope><scope>C1K</scope><scope>F1W</scope><scope>H96</scope><scope>L.G</scope></search><sort><creationdate>20111101</creationdate><title>Efficient and Adaptive Stateful Replication for Stream Processing Engines in High-Availability Cluster</title><author>Feng, Yi-Hsuan ; Huang, Nen-Fu ; Wu, Yen-Min</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c347t-72996a7488fc709d2f6f8c5760e898575b0949e45bf0c9a29a5384e2257255403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Adaptive estimation</topic><topic>adaptive method</topic><topic>Adaptive systems</topic><topic>bloom filters</topic><topic>Clustering methods</topic><topic>Clusters</topic><topic>Dynamical systems</topic><topic>Dynamics</topic><topic>Engines</topic><topic>Filters</topic><topic>high availability</topic><topic>Insertion</topic><topic>Multiple hash functions</topic><topic>Random processes</topic><topic>Replication</topic><topic>Streams</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Feng, Yi-Hsuan</creatorcontrib><creatorcontrib>Huang, Nen-Fu</creatorcontrib><creatorcontrib>Wu, Yen-Min</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aqualine</collection><collection>Water Resources Abstracts</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy & Non-Living Resources</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Feng, Yi-Hsuan</au><au>Huang, Nen-Fu</au><au>Wu, Yen-Min</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient and Adaptive Stateful Replication for Stream Processing Engines in High-Availability Cluster</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2011-11-01</date><risdate>2011</risdate><volume>22</volume><issue>11</issue><spage>1788</spage><epage>1796</epage><pages>1788-1796</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Stateful stream process engines in high availability clusters (HACs) track a large number of concurrent flow states and replicate them to backups to provide reliable functionality. Under high traffic loads, existing solutions in such HACs are expensive owing to precise stateful replication. This work presents two novel methods to address this issue: randomization on replication representation and a replication scheme designed for when system becomes overloaded. A hashing structure called Multilevel Counting Bloom Filter (MLCBF) is proposed as a low resource-consuming solution of stateful replication. Its performance and tradeoffs are then evaluated based on theoretic analysis and extensive trace-based tests. Trace-based simulation reveals that MLCBF reduces network and memory requirements of replication typically by over 90 percent for URL categorization. Most importantly, MLCBF is quite as simple and practical for implementation and maintenance. Moreover, an adaptive scheme called dynamic lazy insertion is designed to prevent replication from overloading system continuously and optimize the throughput of HAC. Testbed evaluation demonstrates its feasibility and effectiveness in an overloaded HAC.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2011.83</doi><tpages>9</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1045-9219
ispartof	IEEE transactions on parallel and distributed systems, 2011-11, Vol.22 (11), p.1788-1796
issn	1045-9219 1558-2183
language	eng
recordid	cdi_crossref_primary_10_1109_TPDS_2011_83
source	IEEE Electronic Library (IEL)
subjects	Adaptive estimation adaptive method Adaptive systems bloom filters Clustering methods Clusters Dynamical systems Dynamics Engines Filters high availability Insertion Multiple hash functions Random processes Replication Streams
title	Efficient and Adaptive Stateful Replication for Stream Processing Engines in High-Availability Cluster
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T09%3A43%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20and%20Adaptive%20Stateful%20Replication%20for%20Stream%20Processing%20Engines%20in%20High-Availability%20Cluster&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Feng,%20Yi-Hsuan&rft.date=2011-11-01&rft.volume=22&rft.issue=11&rft.spage=1788&rft.epage=1796&rft.pages=1788-1796&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2011.83&rft_dat=%3Cproquest_RIE%3E1709771871%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=913126306&rft_id=info:pmid/&rft_ieee_id=5733339&rfr_iscdi=true