A strategy for scheduling reduce task based on intermediate data locality of the MapReduce

In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask schedul...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Cluster computing 2017-12, Vol.20 (4), p.2821-2831
Hauptverfasser:	Shang, Fengjun, Chen, Xuanling, Yan, Chenyun
Format:	Artikel
Sprache:	eng
Schlagworte:	Bandwidths Big Data Clusters Computer Communication Networks Computer Science Consumption Data transfer (computers) Data transmission Information industry Operating Systems Optimization Performance enhancement Priority scheduling Processor Architectures Resource allocation Scheduling Task scheduling
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2831
container_issue	4
container_start_page	2821
container_title	Cluster computing
container_volume	20
creator	Shang, Fengjun Chen, Xuanling Yan, Chenyun
description	In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.
doi_str_mv	10.1007/s10586-017-0972-7
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918217462</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918217462</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-c7dc1ca96ae0975e1c139acf533826fae2f01892086bbde49d2804499e8e8bf03</originalsourceid><addsrcrecordid>eNp1kE9LAzEQxYMoWKsfwFvAczR_djfJsRS1QkUQvXgJ2eyk3brdrUl66Lc3dQVPnubBvN-b4SF0zegto1TeRUZLVRHKJKFaciJP0ISVUhBZFuI0a5G3UpXyHF3EuKH06NIT9DHDMQWbYHXAfgg4ujU0-67tVzhk4QAnGz9xbSM0eOhx2ycIW2jajODGJou7wdmuTQc8eJzWgJ_t7vWHvERn3nYRrn7nFL0_3L_NF2T58vg0ny2JE6xKxMnGMWd1ZSH_VAJzTGjrfCmE4pW3wD1lSnOqqrpuoNANV7QotAYFqvZUTNHNmLsLw9ceYjKbYR_6fNJwzRRnsqh4drHR5cIQYwBvdqHd2nAwjJpjhWas0OQKzbEcIzPDRyZmb7-C8Jf8P_QNMtp0Hg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918217462</pqid></control><display><type>article</type><title>A strategy for scheduling reduce task based on intermediate data locality of the MapReduce</title><source>SpringerLink Journals</source><source>ProQuest Central UK/Ireland</source><source>ProQuest Central</source><creator>Shang, Fengjun ; Chen, Xuanling ; Yan, Chenyun</creator><creatorcontrib>Shang, Fengjun ; Chen, Xuanling ; Yan, Chenyun</creatorcontrib><description>In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.</description><identifier>ISSN: 1386-7857</identifier><identifier>EISSN: 1573-7543</identifier><identifier>DOI: 10.1007/s10586-017-0972-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Bandwidths ; Big Data ; Clusters ; Computer Communication Networks ; Computer Science ; Consumption ; Data transfer (computers) ; Data transmission ; Information industry ; Operating Systems ; Optimization ; Performance enhancement ; Priority scheduling ; Processor Architectures ; Resource allocation ; Scheduling ; Task scheduling</subject><ispartof>Cluster computing, 2017-12, Vol.20 (4), p.2821-2831</ispartof><rights>Springer Science+Business Media, LLC 2017</rights><rights>Springer Science+Business Media, LLC 2017.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-c7dc1ca96ae0975e1c139acf533826fae2f01892086bbde49d2804499e8e8bf03</citedby><cites>FETCH-LOGICAL-c316t-c7dc1ca96ae0975e1c139acf533826fae2f01892086bbde49d2804499e8e8bf03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10586-017-0972-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918217462?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,777,781,21369,27905,27906,33725,41469,42538,43786,51300,64364,64368,72218</link.rule.ids></links><search><creatorcontrib>Shang, Fengjun</creatorcontrib><creatorcontrib>Chen, Xuanling</creatorcontrib><creatorcontrib>Yan, Chenyun</creatorcontrib><title>A strategy for scheduling reduce task based on intermediate data locality of the MapReduce</title><title>Cluster computing</title><addtitle>Cluster Comput</addtitle><description>In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.</description><subject>Bandwidths</subject><subject>Big Data</subject><subject>Clusters</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Consumption</subject><subject>Data transfer (computers)</subject><subject>Data transmission</subject><subject>Information industry</subject><subject>Operating Systems</subject><subject>Optimization</subject><subject>Performance enhancement</subject><subject>Priority scheduling</subject><subject>Processor Architectures</subject><subject>Resource allocation</subject><subject>Scheduling</subject><subject>Task scheduling</subject><issn>1386-7857</issn><issn>1573-7543</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kE9LAzEQxYMoWKsfwFvAczR_djfJsRS1QkUQvXgJ2eyk3brdrUl66Lc3dQVPnubBvN-b4SF0zegto1TeRUZLVRHKJKFaciJP0ISVUhBZFuI0a5G3UpXyHF3EuKH06NIT9DHDMQWbYHXAfgg4ujU0-67tVzhk4QAnGz9xbSM0eOhx2ycIW2jajODGJou7wdmuTQc8eJzWgJ_t7vWHvERn3nYRrn7nFL0_3L_NF2T58vg0ny2JE6xKxMnGMWd1ZSH_VAJzTGjrfCmE4pW3wD1lSnOqqrpuoNANV7QotAYFqvZUTNHNmLsLw9ceYjKbYR_6fNJwzRRnsqh4drHR5cIQYwBvdqHd2nAwjJpjhWas0OQKzbEcIzPDRyZmb7-C8Jf8P_QNMtp0Hg</recordid><startdate>20171201</startdate><enddate>20171201</enddate><creator>Shang, Fengjun</creator><creator>Chen, Xuanling</creator><creator>Yan, Chenyun</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope></search><sort><creationdate>20171201</creationdate><title>A strategy for scheduling reduce task based on intermediate data locality of the MapReduce</title><author>Shang, Fengjun ; Chen, Xuanling ; Yan, Chenyun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-c7dc1ca96ae0975e1c139acf533826fae2f01892086bbde49d2804499e8e8bf03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Bandwidths</topic><topic>Big Data</topic><topic>Clusters</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Consumption</topic><topic>Data transfer (computers)</topic><topic>Data transmission</topic><topic>Information industry</topic><topic>Operating Systems</topic><topic>Optimization</topic><topic>Performance enhancement</topic><topic>Priority scheduling</topic><topic>Processor Architectures</topic><topic>Resource allocation</topic><topic>Scheduling</topic><topic>Task scheduling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shang, Fengjun</creatorcontrib><creatorcontrib>Chen, Xuanling</creatorcontrib><creatorcontrib>Yan, Chenyun</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Cluster computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shang, Fengjun</au><au>Chen, Xuanling</au><au>Yan, Chenyun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A strategy for scheduling reduce task based on intermediate data locality of the MapReduce</atitle><jtitle>Cluster computing</jtitle><stitle>Cluster Comput</stitle><date>2017-12-01</date><risdate>2017</risdate><volume>20</volume><issue>4</issue><spage>2821</spage><epage>2831</epage><pages>2821-2831</pages><issn>1386-7857</issn><eissn>1573-7543</eissn><abstract>In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10586-017-0972-7</doi><tpages>11</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1386-7857
ispartof	Cluster computing, 2017-12, Vol.20 (4), p.2821-2831
issn	1386-7857 1573-7543
language	eng
recordid	cdi_proquest_journals_2918217462
source	SpringerLink Journals; ProQuest Central UK/Ireland; ProQuest Central
subjects	Bandwidths Big Data Clusters Computer Communication Networks Computer Science Consumption Data transfer (computers) Data transmission Information industry Operating Systems Optimization Performance enhancement Priority scheduling Processor Architectures Resource allocation Scheduling Task scheduling
title	A strategy for scheduling reduce task based on intermediate data locality of the MapReduce
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T05%3A09%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20strategy%20for%20scheduling%20reduce%20task%20based%20on%20intermediate%20data%20locality%20of%20the%20MapReduce&rft.jtitle=Cluster%20computing&rft.au=Shang,%20Fengjun&rft.date=2017-12-01&rft.volume=20&rft.issue=4&rft.spage=2821&rft.epage=2831&rft.pages=2821-2831&rft.issn=1386-7857&rft.eissn=1573-7543&rft_id=info:doi/10.1007/s10586-017-0972-7&rft_dat=%3Cproquest_cross%3E2918217462%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918217462&rft_id=info:pmid/&rfr_iscdi=true