A strategy for scheduling reduce task based on intermediate data locality of the MapReduce

In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask schedul...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Cluster computing 2017-12, Vol.20 (4), p.2821-2831
Hauptverfasser: Shang, Fengjun, Chen, Xuanling, Yan, Chenyun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2831
container_issue 4
container_start_page 2821
container_title Cluster computing
container_volume 20
creator Shang, Fengjun
Chen, Xuanling
Yan, Chenyun
description In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.
doi_str_mv 10.1007/s10586-017-0972-7
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918217462</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918217462</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-c7dc1ca96ae0975e1c139acf533826fae2f01892086bbde49d2804499e8e8bf03</originalsourceid><addsrcrecordid>eNp1kE9LAzEQxYMoWKsfwFvAczR_djfJsRS1QkUQvXgJ2eyk3brdrUl66Lc3dQVPnubBvN-b4SF0zegto1TeRUZLVRHKJKFaciJP0ISVUhBZFuI0a5G3UpXyHF3EuKH06NIT9DHDMQWbYHXAfgg4ujU0-67tVzhk4QAnGz9xbSM0eOhx2ycIW2jajODGJou7wdmuTQc8eJzWgJ_t7vWHvERn3nYRrn7nFL0_3L_NF2T58vg0ny2JE6xKxMnGMWd1ZSH_VAJzTGjrfCmE4pW3wD1lSnOqqrpuoNANV7QotAYFqvZUTNHNmLsLw9ceYjKbYR_6fNJwzRRnsqh4drHR5cIQYwBvdqHd2nAwjJpjhWas0OQKzbEcIzPDRyZmb7-C8Jf8P_QNMtp0Hg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918217462</pqid></control><display><type>article</type><title>A strategy for scheduling reduce task based on intermediate data locality of the MapReduce</title><source>SpringerLink Journals</source><source>ProQuest Central UK/Ireland</source><source>ProQuest Central</source><creator>Shang, Fengjun ; Chen, Xuanling ; Yan, Chenyun</creator><creatorcontrib>Shang, Fengjun ; Chen, Xuanling ; Yan, Chenyun</creatorcontrib><description>In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.</description><identifier>ISSN: 1386-7857</identifier><identifier>EISSN: 1573-7543</identifier><identifier>DOI: 10.1007/s10586-017-0972-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Bandwidths ; Big Data ; Clusters ; Computer Communication Networks ; Computer Science ; Consumption ; Data transfer (computers) ; Data transmission ; Information industry ; Operating Systems ; Optimization ; Performance enhancement ; Priority scheduling ; Processor Architectures ; Resource allocation ; Scheduling ; Task scheduling</subject><ispartof>Cluster computing, 2017-12, Vol.20 (4), p.2821-2831</ispartof><rights>Springer Science+Business Media, LLC 2017</rights><rights>Springer Science+Business Media, LLC 2017.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-c7dc1ca96ae0975e1c139acf533826fae2f01892086bbde49d2804499e8e8bf03</citedby><cites>FETCH-LOGICAL-c316t-c7dc1ca96ae0975e1c139acf533826fae2f01892086bbde49d2804499e8e8bf03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10586-017-0972-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918217462?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,777,781,21369,27905,27906,33725,41469,42538,43786,51300,64364,64368,72218</link.rule.ids></links><search><creatorcontrib>Shang, Fengjun</creatorcontrib><creatorcontrib>Chen, Xuanling</creatorcontrib><creatorcontrib>Yan, Chenyun</creatorcontrib><title>A strategy for scheduling reduce task based on intermediate data locality of the MapReduce</title><title>Cluster computing</title><addtitle>Cluster Comput</addtitle><description>In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.</description><subject>Bandwidths</subject><subject>Big Data</subject><subject>Clusters</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Consumption</subject><subject>Data transfer (computers)</subject><subject>Data transmission</subject><subject>Information industry</subject><subject>Operating Systems</subject><subject>Optimization</subject><subject>Performance enhancement</subject><subject>Priority scheduling</subject><subject>Processor Architectures</subject><subject>Resource allocation</subject><subject>Scheduling</subject><subject>Task scheduling</subject><issn>1386-7857</issn><issn>1573-7543</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kE9LAzEQxYMoWKsfwFvAczR_djfJsRS1QkUQvXgJ2eyk3brdrUl66Lc3dQVPnubBvN-b4SF0zegto1TeRUZLVRHKJKFaciJP0ISVUhBZFuI0a5G3UpXyHF3EuKH06NIT9DHDMQWbYHXAfgg4ujU0-67tVzhk4QAnGz9xbSM0eOhx2ycIW2jajODGJou7wdmuTQc8eJzWgJ_t7vWHvERn3nYRrn7nFL0_3L_NF2T58vg0ny2JE6xKxMnGMWd1ZSH_VAJzTGjrfCmE4pW3wD1lSnOqqrpuoNANV7QotAYFqvZUTNHNmLsLw9ceYjKbYR_6fNJwzRRnsqh4drHR5cIQYwBvdqHd2nAwjJpjhWas0OQKzbEcIzPDRyZmb7-C8Jf8P_QNMtp0Hg</recordid><startdate>20171201</startdate><enddate>20171201</enddate><creator>Shang, Fengjun</creator><creator>Chen, Xuanling</creator><creator>Yan, Chenyun</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope></search><sort><creationdate>20171201</creationdate><title>A strategy for scheduling reduce task based on intermediate data locality of the MapReduce</title><author>Shang, Fengjun ; Chen, Xuanling ; Yan, Chenyun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-c7dc1ca96ae0975e1c139acf533826fae2f01892086bbde49d2804499e8e8bf03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Bandwidths</topic><topic>Big Data</topic><topic>Clusters</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Consumption</topic><topic>Data transfer (computers)</topic><topic>Data transmission</topic><topic>Information industry</topic><topic>Operating Systems</topic><topic>Optimization</topic><topic>Performance enhancement</topic><topic>Priority scheduling</topic><topic>Processor Architectures</topic><topic>Resource allocation</topic><topic>Scheduling</topic><topic>Task scheduling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shang, Fengjun</creatorcontrib><creatorcontrib>Chen, Xuanling</creatorcontrib><creatorcontrib>Yan, Chenyun</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Cluster computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shang, Fengjun</au><au>Chen, Xuanling</au><au>Yan, Chenyun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A strategy for scheduling reduce task based on intermediate data locality of the MapReduce</atitle><jtitle>Cluster computing</jtitle><stitle>Cluster Comput</stitle><date>2017-12-01</date><risdate>2017</risdate><volume>20</volume><issue>4</issue><spage>2821</spage><epage>2831</epage><pages>2821-2831</pages><issn>1386-7857</issn><eissn>1573-7543</eissn><abstract>In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10586-017-0972-7</doi><tpages>11</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1386-7857
ispartof Cluster computing, 2017-12, Vol.20 (4), p.2821-2831
issn 1386-7857
1573-7543
language eng
recordid cdi_proquest_journals_2918217462
source SpringerLink Journals; ProQuest Central UK/Ireland; ProQuest Central
subjects Bandwidths
Big Data
Clusters
Computer Communication Networks
Computer Science
Consumption
Data transfer (computers)
Data transmission
Information industry
Operating Systems
Optimization
Performance enhancement
Priority scheduling
Processor Architectures
Resource allocation
Scheduling
Task scheduling
title A strategy for scheduling reduce task based on intermediate data locality of the MapReduce
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T05%3A09%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20strategy%20for%20scheduling%20reduce%20task%20based%20on%20intermediate%20data%20locality%20of%20the%20MapReduce&rft.jtitle=Cluster%20computing&rft.au=Shang,%20Fengjun&rft.date=2017-12-01&rft.volume=20&rft.issue=4&rft.spage=2821&rft.epage=2831&rft.pages=2821-2831&rft.issn=1386-7857&rft.eissn=1573-7543&rft_id=info:doi/10.1007/s10586-017-0972-7&rft_dat=%3Cproquest_cross%3E2918217462%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918217462&rft_id=info:pmid/&rfr_iscdi=true