Optimization for Speculative Execution in Big Data Processing Clusters

A big parallel processing job can be delayed substantially as long as one of its many tasks is being assigned to an unreliable or congested machine. To tackle this so-called straggler problem, most parallel processing frameworks such as MapReduce have adopted various strategies under which the syste...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2017-02, Vol.28 (2), p.530-545
Hauptverfasser: Xu, Huanle, Lau, Wing Cheong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 545
container_issue 2
container_start_page 530
container_title IEEE transactions on parallel and distributed systems
container_volume 28
creator Xu, Huanle
Lau, Wing Cheong
description A big parallel processing job can be delayed substantially as long as one of its many tasks is being assigned to an unreliable or congested machine. To tackle this so-called straggler problem, most parallel processing frameworks such as MapReduce have adopted various strategies under which the system may speculatively launch additional copies of the same task if its progress is abnormally slow when extra idling resource is available. In this paper, we focus on the design of speculative execution schemes for parallel processing clusters from an optimization perspective under different loading conditions. For the lightly loaded case, we analyze and propose one cloning scheme, namely, the Smart Cloning Algorithm (SCA) which is based on maximizing the overall system utility. We also derive the workload threshold under which SCA should be used for speculative execution. For the heavily loaded case, we propose the Enhanced Speculative Execution (ESE) algorithm which is an extension of the Microsoft Mantri scheme to mitigate stragglers. Our simulation results show SCA reduces the total job flowtime, i.e., the job delay/ response time by nearly 6 percent comparing to the speculative execution strategy of Microsoft Mantri. In addition, we show that the ESE Algorithm outperforms the Mantri baseline scheme by 71 percent in terms of the job flowtime while consuming the same amount of computation resource.
doi_str_mv 10.1109/TPDS.2016.2564962
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPDS_2016_2564962</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7466828</ieee_id><sourcerecordid>2174475065</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-3783ac0929888dc0d48af782807ac2cd3b906fa64cb34021244c9a2b1179452d3</originalsourceid><addsrcrecordid>eNo9kFtLAzEQhYMoWC8_QHwJ-Lw198uj9qJCoYXW55BmsyWl3V2TXVF_vaktPs0Z5pyZ4QPgDqMhxkg_rhbj5ZAgLIaEC6YFOQMDzLkqCFb0PGvEeKEJ1pfgKqUtQphxxAZgOm-7sA8_tgtNDasmwmXrXb_L_aeHk6-s_yahhs9hA8e2s3ARG-dTCvUGjnZ96nxMN-Cisrvkb0_1GrxPJ6vRazGbv7yNnmaFI5p2BZWKWoc00Uqp0qGSKVtJRRSS1hFX0rVGorKCuTVliGDCmNOWrDGWmnFS0mvwcNzbxuaj96kz26aPdT5pCJaMSY4Ezy58dLnYpBR9ZdoY9jZ-G4zMAZc54DIHXOaEK2fuj5ngvf_3SyZEfo_-AiNaZT8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174475065</pqid></control><display><type>article</type><title>Optimization for Speculative Execution in Big Data Processing Clusters</title><source>IEEE Electronic Library (IEL)</source><creator>Xu, Huanle ; Lau, Wing Cheong</creator><creatorcontrib>Xu, Huanle ; Lau, Wing Cheong</creatorcontrib><description>A big parallel processing job can be delayed substantially as long as one of its many tasks is being assigned to an unreliable or congested machine. To tackle this so-called straggler problem, most parallel processing frameworks such as MapReduce have adopted various strategies under which the system may speculatively launch additional copies of the same task if its progress is abnormally slow when extra idling resource is available. In this paper, we focus on the design of speculative execution schemes for parallel processing clusters from an optimization perspective under different loading conditions. For the lightly loaded case, we analyze and propose one cloning scheme, namely, the Smart Cloning Algorithm (SCA) which is based on maximizing the overall system utility. We also derive the workload threshold under which SCA should be used for speculative execution. For the heavily loaded case, we propose the Enhanced Speculative Execution (ESE) algorithm which is an extension of the Microsoft Mantri scheme to mitigate stragglers. Our simulation results show SCA reduces the total job flowtime, i.e., the job delay/ response time by nearly 6 percent comparing to the speculative execution strategy of Microsoft Mantri. In addition, we show that the ESE Algorithm outperforms the Mantri baseline scheme by 71 percent in terms of the job flowtime while consuming the same amount of computation resource.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2016.2564962</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Big data ; Central processing units ; Cloning ; Clustering algorithms ; Clusters ; Computer simulation ; CPUs ; Data management ; Data processing ; Hierarchies ; Job scheduling ; Job shop scheduling ; Optimization ; Parallel processing ; Resource scheduling ; Response time ; Servers ; speculative execution ; straggler detection</subject><ispartof>IEEE transactions on parallel and distributed systems, 2017-02, Vol.28 (2), p.530-545</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-3783ac0929888dc0d48af782807ac2cd3b906fa64cb34021244c9a2b1179452d3</citedby><cites>FETCH-LOGICAL-c293t-3783ac0929888dc0d48af782807ac2cd3b906fa64cb34021244c9a2b1179452d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7466828$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7466828$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Xu, Huanle</creatorcontrib><creatorcontrib>Lau, Wing Cheong</creatorcontrib><title>Optimization for Speculative Execution in Big Data Processing Clusters</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>A big parallel processing job can be delayed substantially as long as one of its many tasks is being assigned to an unreliable or congested machine. To tackle this so-called straggler problem, most parallel processing frameworks such as MapReduce have adopted various strategies under which the system may speculatively launch additional copies of the same task if its progress is abnormally slow when extra idling resource is available. In this paper, we focus on the design of speculative execution schemes for parallel processing clusters from an optimization perspective under different loading conditions. For the lightly loaded case, we analyze and propose one cloning scheme, namely, the Smart Cloning Algorithm (SCA) which is based on maximizing the overall system utility. We also derive the workload threshold under which SCA should be used for speculative execution. For the heavily loaded case, we propose the Enhanced Speculative Execution (ESE) algorithm which is an extension of the Microsoft Mantri scheme to mitigate stragglers. Our simulation results show SCA reduces the total job flowtime, i.e., the job delay/ response time by nearly 6 percent comparing to the speculative execution strategy of Microsoft Mantri. In addition, we show that the ESE Algorithm outperforms the Mantri baseline scheme by 71 percent in terms of the job flowtime while consuming the same amount of computation resource.</description><subject>Algorithms</subject><subject>Big data</subject><subject>Central processing units</subject><subject>Cloning</subject><subject>Clustering algorithms</subject><subject>Clusters</subject><subject>Computer simulation</subject><subject>CPUs</subject><subject>Data management</subject><subject>Data processing</subject><subject>Hierarchies</subject><subject>Job scheduling</subject><subject>Job shop scheduling</subject><subject>Optimization</subject><subject>Parallel processing</subject><subject>Resource scheduling</subject><subject>Response time</subject><subject>Servers</subject><subject>speculative execution</subject><subject>straggler detection</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kFtLAzEQhYMoWC8_QHwJ-Lw198uj9qJCoYXW55BmsyWl3V2TXVF_vaktPs0Z5pyZ4QPgDqMhxkg_rhbj5ZAgLIaEC6YFOQMDzLkqCFb0PGvEeKEJ1pfgKqUtQphxxAZgOm-7sA8_tgtNDasmwmXrXb_L_aeHk6-s_yahhs9hA8e2s3ARG-dTCvUGjnZ96nxMN-Cisrvkb0_1GrxPJ6vRazGbv7yNnmaFI5p2BZWKWoc00Uqp0qGSKVtJRRSS1hFX0rVGorKCuTVliGDCmNOWrDGWmnFS0mvwcNzbxuaj96kz26aPdT5pCJaMSY4Ezy58dLnYpBR9ZdoY9jZ-G4zMAZc54DIHXOaEK2fuj5ngvf_3SyZEfo_-AiNaZT8</recordid><startdate>20170201</startdate><enddate>20170201</enddate><creator>Xu, Huanle</creator><creator>Lau, Wing Cheong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20170201</creationdate><title>Optimization for Speculative Execution in Big Data Processing Clusters</title><author>Xu, Huanle ; Lau, Wing Cheong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-3783ac0929888dc0d48af782807ac2cd3b906fa64cb34021244c9a2b1179452d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Big data</topic><topic>Central processing units</topic><topic>Cloning</topic><topic>Clustering algorithms</topic><topic>Clusters</topic><topic>Computer simulation</topic><topic>CPUs</topic><topic>Data management</topic><topic>Data processing</topic><topic>Hierarchies</topic><topic>Job scheduling</topic><topic>Job shop scheduling</topic><topic>Optimization</topic><topic>Parallel processing</topic><topic>Resource scheduling</topic><topic>Response time</topic><topic>Servers</topic><topic>speculative execution</topic><topic>straggler detection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xu, Huanle</creatorcontrib><creatorcontrib>Lau, Wing Cheong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xu, Huanle</au><au>Lau, Wing Cheong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimization for Speculative Execution in Big Data Processing Clusters</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2017-02-01</date><risdate>2017</risdate><volume>28</volume><issue>2</issue><spage>530</spage><epage>545</epage><pages>530-545</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>A big parallel processing job can be delayed substantially as long as one of its many tasks is being assigned to an unreliable or congested machine. To tackle this so-called straggler problem, most parallel processing frameworks such as MapReduce have adopted various strategies under which the system may speculatively launch additional copies of the same task if its progress is abnormally slow when extra idling resource is available. In this paper, we focus on the design of speculative execution schemes for parallel processing clusters from an optimization perspective under different loading conditions. For the lightly loaded case, we analyze and propose one cloning scheme, namely, the Smart Cloning Algorithm (SCA) which is based on maximizing the overall system utility. We also derive the workload threshold under which SCA should be used for speculative execution. For the heavily loaded case, we propose the Enhanced Speculative Execution (ESE) algorithm which is an extension of the Microsoft Mantri scheme to mitigate stragglers. Our simulation results show SCA reduces the total job flowtime, i.e., the job delay/ response time by nearly 6 percent comparing to the speculative execution strategy of Microsoft Mantri. In addition, we show that the ESE Algorithm outperforms the Mantri baseline scheme by 71 percent in terms of the job flowtime while consuming the same amount of computation resource.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2016.2564962</doi><tpages>16</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1045-9219
ispartof IEEE transactions on parallel and distributed systems, 2017-02, Vol.28 (2), p.530-545
issn 1045-9219
1558-2183
language eng
recordid cdi_crossref_primary_10_1109_TPDS_2016_2564962
source IEEE Electronic Library (IEL)
subjects Algorithms
Big data
Central processing units
Cloning
Clustering algorithms
Clusters
Computer simulation
CPUs
Data management
Data processing
Hierarchies
Job scheduling
Job shop scheduling
Optimization
Parallel processing
Resource scheduling
Response time
Servers
speculative execution
straggler detection
title Optimization for Speculative Execution in Big Data Processing Clusters
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T20%3A40%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimization%20for%20Speculative%20Execution%20in%20Big%20Data%20Processing%20Clusters&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Xu,%20Huanle&rft.date=2017-02-01&rft.volume=28&rft.issue=2&rft.spage=530&rft.epage=545&rft.pages=530-545&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2016.2564962&rft_dat=%3Cproquest_RIE%3E2174475065%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174475065&rft_id=info:pmid/&rft_ieee_id=7466828&rfr_iscdi=true