A strategy for scheduling reduce task based on intermediate data locality of the MapReduce
In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask schedul...
Gespeichert in:
Veröffentlicht in: | Cluster computing 2017-12, Vol.20 (4), p.2821-2831 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2831 |
---|---|
container_issue | 4 |
container_start_page | 2821 |
container_title | Cluster computing |
container_volume | 20 |
creator | Shang, Fengjun Chen, Xuanling Yan, Chenyun |
description | In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous. |
doi_str_mv | 10.1007/s10586-017-0972-7 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918217462</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918217462</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-c7dc1ca96ae0975e1c139acf533826fae2f01892086bbde49d2804499e8e8bf03</originalsourceid><addsrcrecordid>eNp1kE9LAzEQxYMoWKsfwFvAczR_djfJsRS1QkUQvXgJ2eyk3brdrUl66Lc3dQVPnubBvN-b4SF0zegto1TeRUZLVRHKJKFaciJP0ISVUhBZFuI0a5G3UpXyHF3EuKH06NIT9DHDMQWbYHXAfgg4ujU0-67tVzhk4QAnGz9xbSM0eOhx2ycIW2jajODGJou7wdmuTQc8eJzWgJ_t7vWHvERn3nYRrn7nFL0_3L_NF2T58vg0ny2JE6xKxMnGMWd1ZSH_VAJzTGjrfCmE4pW3wD1lSnOqqrpuoNANV7QotAYFqvZUTNHNmLsLw9ceYjKbYR_6fNJwzRRnsqh4drHR5cIQYwBvdqHd2nAwjJpjhWas0OQKzbEcIzPDRyZmb7-C8Jf8P_QNMtp0Hg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918217462</pqid></control><display><type>article</type><title>A strategy for scheduling reduce task based on intermediate data locality of the MapReduce</title><source>SpringerLink Journals</source><source>ProQuest Central UK/Ireland</source><source>ProQuest Central</source><creator>Shang, Fengjun ; Chen, Xuanling ; Yan, Chenyun</creator><creatorcontrib>Shang, Fengjun ; Chen, Xuanling ; Yan, Chenyun</creatorcontrib><description>In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.</description><identifier>ISSN: 1386-7857</identifier><identifier>EISSN: 1573-7543</identifier><identifier>DOI: 10.1007/s10586-017-0972-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Bandwidths ; Big Data ; Clusters ; Computer Communication Networks ; Computer Science ; Consumption ; Data transfer (computers) ; Data transmission ; Information industry ; Operating Systems ; Optimization ; Performance enhancement ; Priority scheduling ; Processor Architectures ; Resource allocation ; Scheduling ; Task scheduling</subject><ispartof>Cluster computing, 2017-12, Vol.20 (4), p.2821-2831</ispartof><rights>Springer Science+Business Media, LLC 2017</rights><rights>Springer Science+Business Media, LLC 2017.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-c7dc1ca96ae0975e1c139acf533826fae2f01892086bbde49d2804499e8e8bf03</citedby><cites>FETCH-LOGICAL-c316t-c7dc1ca96ae0975e1c139acf533826fae2f01892086bbde49d2804499e8e8bf03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10586-017-0972-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918217462?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,777,781,21369,27905,27906,33725,41469,42538,43786,51300,64364,64368,72218</link.rule.ids></links><search><creatorcontrib>Shang, Fengjun</creatorcontrib><creatorcontrib>Chen, Xuanling</creatorcontrib><creatorcontrib>Yan, Chenyun</creatorcontrib><title>A strategy for scheduling reduce task based on intermediate data locality of the MapReduce</title><title>Cluster computing</title><addtitle>Cluster Comput</addtitle><description>In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.</description><subject>Bandwidths</subject><subject>Big Data</subject><subject>Clusters</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Consumption</subject><subject>Data transfer (computers)</subject><subject>Data transmission</subject><subject>Information industry</subject><subject>Operating Systems</subject><subject>Optimization</subject><subject>Performance enhancement</subject><subject>Priority scheduling</subject><subject>Processor Architectures</subject><subject>Resource allocation</subject><subject>Scheduling</subject><subject>Task scheduling</subject><issn>1386-7857</issn><issn>1573-7543</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kE9LAzEQxYMoWKsfwFvAczR_djfJsRS1QkUQvXgJ2eyk3brdrUl66Lc3dQVPnubBvN-b4SF0zegto1TeRUZLVRHKJKFaciJP0ISVUhBZFuI0a5G3UpXyHF3EuKH06NIT9DHDMQWbYHXAfgg4ujU0-67tVzhk4QAnGz9xbSM0eOhx2ycIW2jajODGJou7wdmuTQc8eJzWgJ_t7vWHvERn3nYRrn7nFL0_3L_NF2T58vg0ny2JE6xKxMnGMWd1ZSH_VAJzTGjrfCmE4pW3wD1lSnOqqrpuoNANV7QotAYFqvZUTNHNmLsLw9ceYjKbYR_6fNJwzRRnsqh4drHR5cIQYwBvdqHd2nAwjJpjhWas0OQKzbEcIzPDRyZmb7-C8Jf8P_QNMtp0Hg</recordid><startdate>20171201</startdate><enddate>20171201</enddate><creator>Shang, Fengjun</creator><creator>Chen, Xuanling</creator><creator>Yan, Chenyun</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope></search><sort><creationdate>20171201</creationdate><title>A strategy for scheduling reduce task based on intermediate data locality of the MapReduce</title><author>Shang, Fengjun ; Chen, Xuanling ; Yan, Chenyun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-c7dc1ca96ae0975e1c139acf533826fae2f01892086bbde49d2804499e8e8bf03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Bandwidths</topic><topic>Big Data</topic><topic>Clusters</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Consumption</topic><topic>Data transfer (computers)</topic><topic>Data transmission</topic><topic>Information industry</topic><topic>Operating Systems</topic><topic>Optimization</topic><topic>Performance enhancement</topic><topic>Priority scheduling</topic><topic>Processor Architectures</topic><topic>Resource allocation</topic><topic>Scheduling</topic><topic>Task scheduling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shang, Fengjun</creatorcontrib><creatorcontrib>Chen, Xuanling</creatorcontrib><creatorcontrib>Yan, Chenyun</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Cluster computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shang, Fengjun</au><au>Chen, Xuanling</au><au>Yan, Chenyun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A strategy for scheduling reduce task based on intermediate data locality of the MapReduce</atitle><jtitle>Cluster computing</jtitle><stitle>Cluster Comput</stitle><date>2017-12-01</date><risdate>2017</risdate><volume>20</volume><issue>4</issue><spage>2821</spage><epage>2831</epage><pages>2821-2831</pages><issn>1386-7857</issn><eissn>1573-7543</eissn><abstract>In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10586-017-0972-7</doi><tpages>11</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1386-7857 |
ispartof | Cluster computing, 2017-12, Vol.20 (4), p.2821-2831 |
issn | 1386-7857 1573-7543 |
language | eng |
recordid | cdi_proquest_journals_2918217462 |
source | SpringerLink Journals; ProQuest Central UK/Ireland; ProQuest Central |
subjects | Bandwidths Big Data Clusters Computer Communication Networks Computer Science Consumption Data transfer (computers) Data transmission Information industry Operating Systems Optimization Performance enhancement Priority scheduling Processor Architectures Resource allocation Scheduling Task scheduling |
title | A strategy for scheduling reduce task based on intermediate data locality of the MapReduce |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T05%3A09%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20strategy%20for%20scheduling%20reduce%20task%20based%20on%20intermediate%20data%20locality%20of%20the%20MapReduce&rft.jtitle=Cluster%20computing&rft.au=Shang,%20Fengjun&rft.date=2017-12-01&rft.volume=20&rft.issue=4&rft.spage=2821&rft.epage=2831&rft.pages=2821-2831&rft.issn=1386-7857&rft.eissn=1573-7543&rft_id=info:doi/10.1007/s10586-017-0972-7&rft_dat=%3Cproquest_cross%3E2918217462%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918217462&rft_id=info:pmid/&rfr_iscdi=true |