A data-aware workflow scheduling algorithm for heterogeneous distributed systems

The workflow scheduling problem in heterogeneous distributed systems is hard to solve due to both intermediate data transfer time and the computation time for each task being considered. The heterogeneity of the computing power of distributed computational sites and the band width between them makes...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Dengpan Yin, Kosar, Tevfik
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 120
container_issue
container_start_page 114
container_title
container_volume
creator Dengpan Yin
Kosar, Tevfik
description The workflow scheduling problem in heterogeneous distributed systems is hard to solve due to both intermediate data transfer time and the computation time for each task being considered. The heterogeneity of the computing power of distributed computational sites and the band width between them makes the scheduling problem challenging. In this study, we improve a heuristic-based data-aware algorithm to find the optimal scheduling so that the turnaround time of the workflow is minimized. Our improved algorithm outperforms the existing algorithms in both performance and time efficiency in most cases. We also extend our algorithm to solve the co-scheduling problem. In this problem, each task of the workflow can request data from a remote data site before its execution; and also store important intermediate data to a remote data site after the execution. The results show that the turnaround time of the workflow can be shortened significantly using our data-aware algorithm compared to the existing optimal algorithms.
doi_str_mv 10.1109/HPCSim.2011.5999814
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5999814</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5999814</ieee_id><sourcerecordid>5999814</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-35e6b6267d6a191e2411069a51d7d32eaa7ac2be08265144f7196e8da356f45d3</originalsourceid><addsrcrecordid>eNotj0FOwzAQRc0CCSg9QTe-QILHdpx4WUVAkSpRie6rST1JDEmDbFdRb08l-jdv9_Q-YysQOYCwL5td_eXHXAqAvLDWVqDv2BMYkJVWldQPbBnjt7jOGAtGPLLdmjtMmOGMgfg8hZ92mGYejz258-BPHcehm4JP_cjbKfCeEoWpoxNN58idjyn45pzI8XiJicb4zO5bHCItb1yw_dvrvt5k28_3j3q9zbwVKVMFmcZIUzqDYIGkvvYbiwW40ilJiCUeZUOikqYArdsSrKHKoSpMqwunFmz1r_VEdPgNfsRwOdw-qz932E7F</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A data-aware workflow scheduling algorithm for heterogeneous distributed systems</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Dengpan Yin ; Kosar, Tevfik</creator><creatorcontrib>Dengpan Yin ; Kosar, Tevfik</creatorcontrib><description>The workflow scheduling problem in heterogeneous distributed systems is hard to solve due to both intermediate data transfer time and the computation time for each task being considered. The heterogeneity of the computing power of distributed computational sites and the band width between them makes the scheduling problem challenging. In this study, we improve a heuristic-based data-aware algorithm to find the optimal scheduling so that the turnaround time of the workflow is minimized. Our improved algorithm outperforms the existing algorithms in both performance and time efficiency in most cases. We also extend our algorithm to solve the co-scheduling problem. In this problem, each task of the workflow can request data from a remote data site before its execution; and also store important intermediate data to a remote data site after the execution. The results show that the turnaround time of the workflow can be shortened significantly using our data-aware algorithm compared to the existing optimal algorithms.</description><identifier>EISBN: 1612843824</identifier><identifier>EISBN: 9781612843834</identifier><identifier>EISBN: 1612843832</identifier><identifier>EISBN: 9781612843827</identifier><identifier>DOI: 10.1109/HPCSim.2011.5999814</identifier><language>eng</language><publisher>IEEE</publisher><subject>Bandwidth ; Data intensive supercomputing ; Distributed databases ; Grid and cluster computing ; Large scale scientific computing ; Large scale systems ; Optimal scheduling ; Processor scheduling ; Program processors ; Scheduling ; Search problems ; Workflow scheduling</subject><ispartof>2011 International Conference on High Performance Computing &amp; Simulation, 2011, p.114-120</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5999814$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2051,27904,54898</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5999814$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dengpan Yin</creatorcontrib><creatorcontrib>Kosar, Tevfik</creatorcontrib><title>A data-aware workflow scheduling algorithm for heterogeneous distributed systems</title><title>2011 International Conference on High Performance Computing &amp; Simulation</title><addtitle>HPCSim</addtitle><description>The workflow scheduling problem in heterogeneous distributed systems is hard to solve due to both intermediate data transfer time and the computation time for each task being considered. The heterogeneity of the computing power of distributed computational sites and the band width between them makes the scheduling problem challenging. In this study, we improve a heuristic-based data-aware algorithm to find the optimal scheduling so that the turnaround time of the workflow is minimized. Our improved algorithm outperforms the existing algorithms in both performance and time efficiency in most cases. We also extend our algorithm to solve the co-scheduling problem. In this problem, each task of the workflow can request data from a remote data site before its execution; and also store important intermediate data to a remote data site after the execution. The results show that the turnaround time of the workflow can be shortened significantly using our data-aware algorithm compared to the existing optimal algorithms.</description><subject>Bandwidth</subject><subject>Data intensive supercomputing</subject><subject>Distributed databases</subject><subject>Grid and cluster computing</subject><subject>Large scale scientific computing</subject><subject>Large scale systems</subject><subject>Optimal scheduling</subject><subject>Processor scheduling</subject><subject>Program processors</subject><subject>Scheduling</subject><subject>Search problems</subject><subject>Workflow scheduling</subject><isbn>1612843824</isbn><isbn>9781612843834</isbn><isbn>1612843832</isbn><isbn>9781612843827</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotj0FOwzAQRc0CCSg9QTe-QILHdpx4WUVAkSpRie6rST1JDEmDbFdRb08l-jdv9_Q-YysQOYCwL5td_eXHXAqAvLDWVqDv2BMYkJVWldQPbBnjt7jOGAtGPLLdmjtMmOGMgfg8hZ92mGYejz258-BPHcehm4JP_cjbKfCeEoWpoxNN58idjyn45pzI8XiJicb4zO5bHCItb1yw_dvrvt5k28_3j3q9zbwVKVMFmcZIUzqDYIGkvvYbiwW40ilJiCUeZUOikqYArdsSrKHKoSpMqwunFmz1r_VEdPgNfsRwOdw-qz932E7F</recordid><startdate>201107</startdate><enddate>201107</enddate><creator>Dengpan Yin</creator><creator>Kosar, Tevfik</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201107</creationdate><title>A data-aware workflow scheduling algorithm for heterogeneous distributed systems</title><author>Dengpan Yin ; Kosar, Tevfik</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-35e6b6267d6a191e2411069a51d7d32eaa7ac2be08265144f7196e8da356f45d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Bandwidth</topic><topic>Data intensive supercomputing</topic><topic>Distributed databases</topic><topic>Grid and cluster computing</topic><topic>Large scale scientific computing</topic><topic>Large scale systems</topic><topic>Optimal scheduling</topic><topic>Processor scheduling</topic><topic>Program processors</topic><topic>Scheduling</topic><topic>Search problems</topic><topic>Workflow scheduling</topic><toplevel>online_resources</toplevel><creatorcontrib>Dengpan Yin</creatorcontrib><creatorcontrib>Kosar, Tevfik</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dengpan Yin</au><au>Kosar, Tevfik</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A data-aware workflow scheduling algorithm for heterogeneous distributed systems</atitle><btitle>2011 International Conference on High Performance Computing &amp; Simulation</btitle><stitle>HPCSim</stitle><date>2011-07</date><risdate>2011</risdate><spage>114</spage><epage>120</epage><pages>114-120</pages><eisbn>1612843824</eisbn><eisbn>9781612843834</eisbn><eisbn>1612843832</eisbn><eisbn>9781612843827</eisbn><abstract>The workflow scheduling problem in heterogeneous distributed systems is hard to solve due to both intermediate data transfer time and the computation time for each task being considered. The heterogeneity of the computing power of distributed computational sites and the band width between them makes the scheduling problem challenging. In this study, we improve a heuristic-based data-aware algorithm to find the optimal scheduling so that the turnaround time of the workflow is minimized. Our improved algorithm outperforms the existing algorithms in both performance and time efficiency in most cases. We also extend our algorithm to solve the co-scheduling problem. In this problem, each task of the workflow can request data from a remote data site before its execution; and also store important intermediate data to a remote data site after the execution. The results show that the turnaround time of the workflow can be shortened significantly using our data-aware algorithm compared to the existing optimal algorithms.</abstract><pub>IEEE</pub><doi>10.1109/HPCSim.2011.5999814</doi><tpages>7</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISBN: 1612843824
ispartof 2011 International Conference on High Performance Computing & Simulation, 2011, p.114-120
issn
language eng
recordid cdi_ieee_primary_5999814
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Bandwidth
Data intensive supercomputing
Distributed databases
Grid and cluster computing
Large scale scientific computing
Large scale systems
Optimal scheduling
Processor scheduling
Program processors
Scheduling
Search problems
Workflow scheduling
title A data-aware workflow scheduling algorithm for heterogeneous distributed systems
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T01%3A38%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20data-aware%20workflow%20scheduling%20algorithm%20for%20heterogeneous%20distributed%20systems&rft.btitle=2011%20International%20Conference%20on%20High%20Performance%20Computing%20&%20Simulation&rft.au=Dengpan%20Yin&rft.date=2011-07&rft.spage=114&rft.epage=120&rft.pages=114-120&rft_id=info:doi/10.1109/HPCSim.2011.5999814&rft_dat=%3Cieee_6IE%3E5999814%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1612843824&rft.eisbn_list=9781612843834&rft.eisbn_list=1612843832&rft.eisbn_list=9781612843827&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5999814&rfr_iscdi=true