Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery

There has been quite some research on the development of tools and techniques for grid systems, yet some important issues, e.g., grid service reliability and task scheduling in the grid, have not been sufficiently studied. For some grid services which have large subtasks requiring time-consuming com...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on reliability 2011-03, Vol.60 (1), p.263-274
Hauptverfasser:	Guo, Suchang, Huang, Hong-Zhong, Wang, Zhonglai, Xie, Min
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Ant colony optimization Colonies Computation fault recovery Fault tolerance Fault tolerant systems Faults grid service reliability Hardware Mathematical models Operations research Optimization Peer to peer computing Quality of service recoverability Recovery Software Studies Task scheduling
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	274
container_issue	1
container_start_page	263
container_title	IEEE transactions on reliability
container_volume	60
creator	Guo, Suchang Huang, Hong-Zhong Wang, Zhonglai Xie, Min
description	There has been quite some research on the development of tools and techniques for grid systems, yet some important issues, e.g., grid service reliability and task scheduling in the grid, have not been sufficiently studied. For some grid services which have large subtasks requiring time-consuming computation, the reliability of grid service could be rather low. To resolve this problem, this paper introduces Local Node Fault Recovery (LNFR) mechanism into grid systems, and presents an in-depth study on grid service reliability modeling and analysis with this kind of fault recovery. To make LNFR mechanism practical, some constraints, i.e. the life times of subtasks, and the numbers of recoveries performed in grid nodes, are introduced; and grid service reliability models under these practical constraints are developed. Based on the proposed grid service reliability model, a multi-objective task scheduling optimization model is presented, and an ant colony optimization (ACO) algorithm is developed to solve it effectively. A numerical example is given to illustrate the influence of fault recovery on grid service reliability, and show a high efficiency of ACO in solving the grid task scheduling problem.
doi_str_mv	10.1109/TR.2010.2104190
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_864397750</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5699967</ieee_id><sourcerecordid>864397750</sourcerecordid><originalsourceid>FETCH-LOGICAL-c386t-7a774985ed138242c37566d8e4a3dae4aaf4ea7983a350e1b06335a6e83c5fd13</originalsourceid><addsrcrecordid>eNpdUE1PwkAQ3RhNRPTswUvjxVNht_t9NETQBEMCNfG2WdpBF0uLuy0J_95FiAcvM_My783HQ-iW4AEhWA_z-SDDEWQEM6LxGeoRzlVKZEbOUQ9jolLNM32JrkJYR8iYVj30PvGuTBbgd66AZA6Vs0tXuXafvDZlRPVHYusymW1bt7FVktvwlSyKTyi7396oqYMrwR_qse2qNo4omh34_TW6WNkqwM0p99Hb-CkfPafT2eRl9DhNC6pEm0orZTyEQ0moylhWUMmFKBUwS0sbo10xsFIrainHQJZYUMqtAEULvoqiPno4zt365ruD0JqNCwVUla2h6YJRglEtJceRef-PuW46X8fjjOKCMaLilj4aHkmFb0LwsDJbHz_3e0OwOfhs8rk5-GxOPkfF3VHhAOCPzYXWWkj6A8n0eFw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>856441898</pqid></control><display><type>article</type><title>Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery</title><source>IEEE Electronic Library (IEL)</source><creator>Guo, Suchang ; Huang, Hong-Zhong ; Wang, Zhonglai ; Xie, Min</creator><creatorcontrib>Guo, Suchang ; Huang, Hong-Zhong ; Wang, Zhonglai ; Xie, Min</creatorcontrib><description>There has been quite some research on the development of tools and techniques for grid systems, yet some important issues, e.g., grid service reliability and task scheduling in the grid, have not been sufficiently studied. For some grid services which have large subtasks requiring time-consuming computation, the reliability of grid service could be rather low. To resolve this problem, this paper introduces Local Node Fault Recovery (LNFR) mechanism into grid systems, and presents an in-depth study on grid service reliability modeling and analysis with this kind of fault recovery. To make LNFR mechanism practical, some constraints, i.e. the life times of subtasks, and the numbers of recoveries performed in grid nodes, are introduced; and grid service reliability models under these practical constraints are developed. Based on the proposed grid service reliability model, a multi-objective task scheduling optimization model is presented, and an ant colony optimization (ACO) algorithm is developed to solve it effectively. A numerical example is given to illustrate the influence of fault recovery on grid service reliability, and show a high efficiency of ACO in solving the grid task scheduling problem.</description><identifier>ISSN: 0018-9529</identifier><identifier>EISSN: 1558-1721</identifier><identifier>DOI: 10.1109/TR.2010.2104190</identifier><identifier>CODEN: IERQAD</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Ant colony optimization ; Colonies ; Computation ; fault recovery ; Fault tolerance ; Fault tolerant systems ; Faults ; grid service reliability ; Hardware ; Mathematical models ; Operations research ; Optimization ; Peer to peer computing ; Quality of service ; recoverability ; Recovery ; Software ; Studies ; Task scheduling</subject><ispartof>IEEE transactions on reliability, 2011-03, Vol.60 (1), p.263-274</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Mar 2011</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c386t-7a774985ed138242c37566d8e4a3dae4aaf4ea7983a350e1b06335a6e83c5fd13</citedby><cites>FETCH-LOGICAL-c386t-7a774985ed138242c37566d8e4a3dae4aaf4ea7983a350e1b06335a6e83c5fd13</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5699967$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27929,27930,54763</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5699967$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Guo, Suchang</creatorcontrib><creatorcontrib>Huang, Hong-Zhong</creatorcontrib><creatorcontrib>Wang, Zhonglai</creatorcontrib><creatorcontrib>Xie, Min</creatorcontrib><title>Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery</title><title>IEEE transactions on reliability</title><addtitle>TR</addtitle><description>There has been quite some research on the development of tools and techniques for grid systems, yet some important issues, e.g., grid service reliability and task scheduling in the grid, have not been sufficiently studied. For some grid services which have large subtasks requiring time-consuming computation, the reliability of grid service could be rather low. To resolve this problem, this paper introduces Local Node Fault Recovery (LNFR) mechanism into grid systems, and presents an in-depth study on grid service reliability modeling and analysis with this kind of fault recovery. To make LNFR mechanism practical, some constraints, i.e. the life times of subtasks, and the numbers of recoveries performed in grid nodes, are introduced; and grid service reliability models under these practical constraints are developed. Based on the proposed grid service reliability model, a multi-objective task scheduling optimization model is presented, and an ant colony optimization (ACO) algorithm is developed to solve it effectively. A numerical example is given to illustrate the influence of fault recovery on grid service reliability, and show a high efficiency of ACO in solving the grid task scheduling problem.</description><subject>Algorithms</subject><subject>Ant colony optimization</subject><subject>Colonies</subject><subject>Computation</subject><subject>fault recovery</subject><subject>Fault tolerance</subject><subject>Fault tolerant systems</subject><subject>Faults</subject><subject>grid service reliability</subject><subject>Hardware</subject><subject>Mathematical models</subject><subject>Operations research</subject><subject>Optimization</subject><subject>Peer to peer computing</subject><subject>Quality of service</subject><subject>recoverability</subject><subject>Recovery</subject><subject>Software</subject><subject>Studies</subject><subject>Task scheduling</subject><issn>0018-9529</issn><issn>1558-1721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdUE1PwkAQ3RhNRPTswUvjxVNht_t9NETQBEMCNfG2WdpBF0uLuy0J_95FiAcvM_My783HQ-iW4AEhWA_z-SDDEWQEM6LxGeoRzlVKZEbOUQ9jolLNM32JrkJYR8iYVj30PvGuTBbgd66AZA6Vs0tXuXafvDZlRPVHYusymW1bt7FVktvwlSyKTyi7396oqYMrwR_qse2qNo4omh34_TW6WNkqwM0p99Hb-CkfPafT2eRl9DhNC6pEm0orZTyEQ0moylhWUMmFKBUwS0sbo10xsFIrainHQJZYUMqtAEULvoqiPno4zt365ruD0JqNCwVUla2h6YJRglEtJceRef-PuW46X8fjjOKCMaLilj4aHkmFb0LwsDJbHz_3e0OwOfhs8rk5-GxOPkfF3VHhAOCPzYXWWkj6A8n0eFw</recordid><startdate>201103</startdate><enddate>201103</enddate><creator>Guo, Suchang</creator><creator>Huang, Hong-Zhong</creator><creator>Wang, Zhonglai</creator><creator>Xie, Min</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>201103</creationdate><title>Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery</title><author>Guo, Suchang ; Huang, Hong-Zhong ; Wang, Zhonglai ; Xie, Min</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c386t-7a774985ed138242c37566d8e4a3dae4aaf4ea7983a350e1b06335a6e83c5fd13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Algorithms</topic><topic>Ant colony optimization</topic><topic>Colonies</topic><topic>Computation</topic><topic>fault recovery</topic><topic>Fault tolerance</topic><topic>Fault tolerant systems</topic><topic>Faults</topic><topic>grid service reliability</topic><topic>Hardware</topic><topic>Mathematical models</topic><topic>Operations research</topic><topic>Optimization</topic><topic>Peer to peer computing</topic><topic>Quality of service</topic><topic>recoverability</topic><topic>Recovery</topic><topic>Software</topic><topic>Studies</topic><topic>Task scheduling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Guo, Suchang</creatorcontrib><creatorcontrib>Huang, Hong-Zhong</creatorcontrib><creatorcontrib>Wang, Zhonglai</creatorcontrib><creatorcontrib>Xie, Min</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on reliability</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Guo, Suchang</au><au>Huang, Hong-Zhong</au><au>Wang, Zhonglai</au><au>Xie, Min</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery</atitle><jtitle>IEEE transactions on reliability</jtitle><stitle>TR</stitle><date>2011-03</date><risdate>2011</risdate><volume>60</volume><issue>1</issue><spage>263</spage><epage>274</epage><pages>263-274</pages><issn>0018-9529</issn><eissn>1558-1721</eissn><coden>IERQAD</coden><abstract>There has been quite some research on the development of tools and techniques for grid systems, yet some important issues, e.g., grid service reliability and task scheduling in the grid, have not been sufficiently studied. For some grid services which have large subtasks requiring time-consuming computation, the reliability of grid service could be rather low. To resolve this problem, this paper introduces Local Node Fault Recovery (LNFR) mechanism into grid systems, and presents an in-depth study on grid service reliability modeling and analysis with this kind of fault recovery. To make LNFR mechanism practical, some constraints, i.e. the life times of subtasks, and the numbers of recoveries performed in grid nodes, are introduced; and grid service reliability models under these practical constraints are developed. Based on the proposed grid service reliability model, a multi-objective task scheduling optimization model is presented, and an ant colony optimization (ACO) algorithm is developed to solve it effectively. A numerical example is given to illustrate the influence of fault recovery on grid service reliability, and show a high efficiency of ACO in solving the grid task scheduling problem.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TR.2010.2104190</doi><tpages>12</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0018-9529
ispartof	IEEE transactions on reliability, 2011-03, Vol.60 (1), p.263-274
issn	0018-9529 1558-1721
language	eng
recordid	cdi_proquest_miscellaneous_864397750
source	IEEE Electronic Library (IEL)
subjects	Algorithms Ant colony optimization Colonies Computation fault recovery Fault tolerance Fault tolerant systems Faults grid service reliability Hardware Mathematical models Operations research Optimization Peer to peer computing Quality of service recoverability Recovery Software Studies Task scheduling
title	Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T18%3A52%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Grid%20Service%20Reliability%20Modeling%20and%20Optimal%20Task%20Scheduling%20Considering%20Fault%20Recovery&rft.jtitle=IEEE%20transactions%20on%20reliability&rft.au=Guo,%20Suchang&rft.date=2011-03&rft.volume=60&rft.issue=1&rft.spage=263&rft.epage=274&rft.pages=263-274&rft.issn=0018-9529&rft.eissn=1558-1721&rft.coden=IERQAD&rft_id=info:doi/10.1109/TR.2010.2104190&rft_dat=%3Cproquest_RIE%3E864397750%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=856441898&rft_id=info:pmid/&rft_ieee_id=5699967&rfr_iscdi=true