An adaptive failure recovery mechanism based on asymmetric routing for data center networks

As the infrastructure of high-performance computing, the data center network plays an important role. As network failures occur frequently, data center networks demand highly performed, robust, and energy-efficient failure recovery mechanisms. Despite process, the existing work still has a huge scop...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of supercomputing 2021-02, Vol.77 (2), p.2103-2123
Hauptverfasser: Liu, Yong, Gu, Huaxi, Wang, Kun, Yu, Xiaoshan, Wang, Yunhao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2123
container_issue 2
container_start_page 2103
container_title The Journal of supercomputing
container_volume 77
creator Liu, Yong
Gu, Huaxi
Wang, Kun
Yu, Xiaoshan
Wang, Yunhao
description As the infrastructure of high-performance computing, the data center network plays an important role. As network failures occur frequently, data center networks demand highly performed, robust, and energy-efficient failure recovery mechanisms. Despite process, the existing work still has a huge scope to improve to satisfy these requirements. The backup-based failure recovery schemes reserve backup paths in advance, which results in a large energy consumption under normal network conditions. In order to solve the energy consumption problem, the existing adaptive failure recovery schemes are proposed to calculate the rerouting path of the traffic on the failed link, which reduces the energy consumption. However, most adaptive fault recovery solutions apply multi-path routing to calculate the re-routing path. As multi-path routing cannot detect the congestion status of the path under the asymmetric topology caused by link failures, the network is congested, which ends up in less robustness of the network. In view of this, we design and evaluate AFRM, a novel adaptive failure recovery mechanism that overcomes these challenges. AFRM uses asymmetrical routing to calculate the re-routing path by being congestion-aware and is more robust to topological asymmetries compared with existing schemes. The asymmetrical routing dynamically schedules flows to the path with the least marginal cost, which makes AFRM much more energy-efficient. Additionally, AFRM achieves fast link failure detection based on hash storage and flow table matching. Evaluations show that AFRM can do the trade-off between failure recovery time and energy consumption, reduce flow completion time, and increase network throughput compared with existing schemes.
doi_str_mv 10.1007/s11227-020-03337-4
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2480787498</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2480787498</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-e734b852f7e0dd56255be0d94444868172a3ed12cfaf475699457f4827c0edbb3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPA82o-m-yxFL-g4EVPHkI2O6lbu5uaZCv990YreHMuMzDPOwMPQpeUXFNC1E2ilDFVEUYqwjlXlThCEyoVr4jQ4hhNSF1WWgp2is5SWhNCBFd8gl7nA7at3eZuB9jbbjNGwBFc2EHc4x7cmx261OPGJmhxKHDa9z3k2Dkcw5i7YYV9iLi12WIHQ4aIB8ifIb6nc3Ti7SbBxW-fope72-fFQ7V8un9czJeV47TOFSguGi2ZV0DaVs6YlE2ZalFKzzRVzHJoKXPeeqHkrK6FVF5ophyBtmn4FF0d7m5j-BghZbMOYxzKS8OEJkorUetCsQPlYkgpgjfb2PU27g0l5luiOUg0RaL5kWhECfFDKBV4WEH8O_1P6gsvsXUp</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2480787498</pqid></control><display><type>article</type><title>An adaptive failure recovery mechanism based on asymmetric routing for data center networks</title><source>SpringerLink Journals - AutoHoldings</source><creator>Liu, Yong ; Gu, Huaxi ; Wang, Kun ; Yu, Xiaoshan ; Wang, Yunhao</creator><creatorcontrib>Liu, Yong ; Gu, Huaxi ; Wang, Kun ; Yu, Xiaoshan ; Wang, Yunhao</creatorcontrib><description>As the infrastructure of high-performance computing, the data center network plays an important role. As network failures occur frequently, data center networks demand highly performed, robust, and energy-efficient failure recovery mechanisms. Despite process, the existing work still has a huge scope to improve to satisfy these requirements. The backup-based failure recovery schemes reserve backup paths in advance, which results in a large energy consumption under normal network conditions. In order to solve the energy consumption problem, the existing adaptive failure recovery schemes are proposed to calculate the rerouting path of the traffic on the failed link, which reduces the energy consumption. However, most adaptive fault recovery solutions apply multi-path routing to calculate the re-routing path. As multi-path routing cannot detect the congestion status of the path under the asymmetric topology caused by link failures, the network is congested, which ends up in less robustness of the network. In view of this, we design and evaluate AFRM, a novel adaptive failure recovery mechanism that overcomes these challenges. AFRM uses asymmetrical routing to calculate the re-routing path by being congestion-aware and is more robust to topological asymmetries compared with existing schemes. The asymmetrical routing dynamically schedules flows to the path with the least marginal cost, which makes AFRM much more energy-efficient. Additionally, AFRM achieves fast link failure detection based on hash storage and flow table matching. Evaluations show that AFRM can do the trade-off between failure recovery time and energy consumption, reduce flow completion time, and increase network throughput compared with existing schemes.</description><identifier>ISSN: 0920-8542</identifier><identifier>EISSN: 1573-0484</identifier><identifier>DOI: 10.1007/s11227-020-03337-4</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Asymmetry ; Compilers ; Completion time ; Computer centers ; Computer Science ; Data centers ; Energy consumption ; Energy recovery ; Failure ; Failure detection ; Interpreters ; Processor Architectures ; Programming Languages ; Recovery time ; Rerouteing ; Robustness ; Route planning ; Schedules ; Topology</subject><ispartof>The Journal of supercomputing, 2021-02, Vol.77 (2), p.2103-2123</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-e734b852f7e0dd56255be0d94444868172a3ed12cfaf475699457f4827c0edbb3</citedby><cites>FETCH-LOGICAL-c319t-e734b852f7e0dd56255be0d94444868172a3ed12cfaf475699457f4827c0edbb3</cites><orcidid>0000-0002-6409-2229</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11227-020-03337-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11227-020-03337-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Liu, Yong</creatorcontrib><creatorcontrib>Gu, Huaxi</creatorcontrib><creatorcontrib>Wang, Kun</creatorcontrib><creatorcontrib>Yu, Xiaoshan</creatorcontrib><creatorcontrib>Wang, Yunhao</creatorcontrib><title>An adaptive failure recovery mechanism based on asymmetric routing for data center networks</title><title>The Journal of supercomputing</title><addtitle>J Supercomput</addtitle><description>As the infrastructure of high-performance computing, the data center network plays an important role. As network failures occur frequently, data center networks demand highly performed, robust, and energy-efficient failure recovery mechanisms. Despite process, the existing work still has a huge scope to improve to satisfy these requirements. The backup-based failure recovery schemes reserve backup paths in advance, which results in a large energy consumption under normal network conditions. In order to solve the energy consumption problem, the existing adaptive failure recovery schemes are proposed to calculate the rerouting path of the traffic on the failed link, which reduces the energy consumption. However, most adaptive fault recovery solutions apply multi-path routing to calculate the re-routing path. As multi-path routing cannot detect the congestion status of the path under the asymmetric topology caused by link failures, the network is congested, which ends up in less robustness of the network. In view of this, we design and evaluate AFRM, a novel adaptive failure recovery mechanism that overcomes these challenges. AFRM uses asymmetrical routing to calculate the re-routing path by being congestion-aware and is more robust to topological asymmetries compared with existing schemes. The asymmetrical routing dynamically schedules flows to the path with the least marginal cost, which makes AFRM much more energy-efficient. Additionally, AFRM achieves fast link failure detection based on hash storage and flow table matching. Evaluations show that AFRM can do the trade-off between failure recovery time and energy consumption, reduce flow completion time, and increase network throughput compared with existing schemes.</description><subject>Asymmetry</subject><subject>Compilers</subject><subject>Completion time</subject><subject>Computer centers</subject><subject>Computer Science</subject><subject>Data centers</subject><subject>Energy consumption</subject><subject>Energy recovery</subject><subject>Failure</subject><subject>Failure detection</subject><subject>Interpreters</subject><subject>Processor Architectures</subject><subject>Programming Languages</subject><subject>Recovery time</subject><subject>Rerouteing</subject><subject>Robustness</subject><subject>Route planning</subject><subject>Schedules</subject><subject>Topology</subject><issn>0920-8542</issn><issn>1573-0484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFPA82o-m-yxFL-g4EVPHkI2O6lbu5uaZCv990YreHMuMzDPOwMPQpeUXFNC1E2ilDFVEUYqwjlXlThCEyoVr4jQ4hhNSF1WWgp2is5SWhNCBFd8gl7nA7at3eZuB9jbbjNGwBFc2EHc4x7cmx261OPGJmhxKHDa9z3k2Dkcw5i7YYV9iLi12WIHQ4aIB8ifIb6nc3Ti7SbBxW-fope72-fFQ7V8un9czJeV47TOFSguGi2ZV0DaVs6YlE2ZalFKzzRVzHJoKXPeeqHkrK6FVF5ophyBtmn4FF0d7m5j-BghZbMOYxzKS8OEJkorUetCsQPlYkgpgjfb2PU27g0l5luiOUg0RaL5kWhECfFDKBV4WEH8O_1P6gsvsXUp</recordid><startdate>20210201</startdate><enddate>20210201</enddate><creator>Liu, Yong</creator><creator>Gu, Huaxi</creator><creator>Wang, Kun</creator><creator>Yu, Xiaoshan</creator><creator>Wang, Yunhao</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-6409-2229</orcidid></search><sort><creationdate>20210201</creationdate><title>An adaptive failure recovery mechanism based on asymmetric routing for data center networks</title><author>Liu, Yong ; Gu, Huaxi ; Wang, Kun ; Yu, Xiaoshan ; Wang, Yunhao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-e734b852f7e0dd56255be0d94444868172a3ed12cfaf475699457f4827c0edbb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Asymmetry</topic><topic>Compilers</topic><topic>Completion time</topic><topic>Computer centers</topic><topic>Computer Science</topic><topic>Data centers</topic><topic>Energy consumption</topic><topic>Energy recovery</topic><topic>Failure</topic><topic>Failure detection</topic><topic>Interpreters</topic><topic>Processor Architectures</topic><topic>Programming Languages</topic><topic>Recovery time</topic><topic>Rerouteing</topic><topic>Robustness</topic><topic>Route planning</topic><topic>Schedules</topic><topic>Topology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Yong</creatorcontrib><creatorcontrib>Gu, Huaxi</creatorcontrib><creatorcontrib>Wang, Kun</creatorcontrib><creatorcontrib>Yu, Xiaoshan</creatorcontrib><creatorcontrib>Wang, Yunhao</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of supercomputing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Yong</au><au>Gu, Huaxi</au><au>Wang, Kun</au><au>Yu, Xiaoshan</au><au>Wang, Yunhao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An adaptive failure recovery mechanism based on asymmetric routing for data center networks</atitle><jtitle>The Journal of supercomputing</jtitle><stitle>J Supercomput</stitle><date>2021-02-01</date><risdate>2021</risdate><volume>77</volume><issue>2</issue><spage>2103</spage><epage>2123</epage><pages>2103-2123</pages><issn>0920-8542</issn><eissn>1573-0484</eissn><abstract>As the infrastructure of high-performance computing, the data center network plays an important role. As network failures occur frequently, data center networks demand highly performed, robust, and energy-efficient failure recovery mechanisms. Despite process, the existing work still has a huge scope to improve to satisfy these requirements. The backup-based failure recovery schemes reserve backup paths in advance, which results in a large energy consumption under normal network conditions. In order to solve the energy consumption problem, the existing adaptive failure recovery schemes are proposed to calculate the rerouting path of the traffic on the failed link, which reduces the energy consumption. However, most adaptive fault recovery solutions apply multi-path routing to calculate the re-routing path. As multi-path routing cannot detect the congestion status of the path under the asymmetric topology caused by link failures, the network is congested, which ends up in less robustness of the network. In view of this, we design and evaluate AFRM, a novel adaptive failure recovery mechanism that overcomes these challenges. AFRM uses asymmetrical routing to calculate the re-routing path by being congestion-aware and is more robust to topological asymmetries compared with existing schemes. The asymmetrical routing dynamically schedules flows to the path with the least marginal cost, which makes AFRM much more energy-efficient. Additionally, AFRM achieves fast link failure detection based on hash storage and flow table matching. Evaluations show that AFRM can do the trade-off between failure recovery time and energy consumption, reduce flow completion time, and increase network throughput compared with existing schemes.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11227-020-03337-4</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0002-6409-2229</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0920-8542
ispartof The Journal of supercomputing, 2021-02, Vol.77 (2), p.2103-2123
issn 0920-8542
1573-0484
language eng
recordid cdi_proquest_journals_2480787498
source SpringerLink Journals - AutoHoldings
subjects Asymmetry
Compilers
Completion time
Computer centers
Computer Science
Data centers
Energy consumption
Energy recovery
Failure
Failure detection
Interpreters
Processor Architectures
Programming Languages
Recovery time
Rerouteing
Robustness
Route planning
Schedules
Topology
title An adaptive failure recovery mechanism based on asymmetric routing for data center networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T06%3A29%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20adaptive%20failure%20recovery%20mechanism%20based%20on%20asymmetric%20routing%20for%20data%20center%20networks&rft.jtitle=The%20Journal%20of%20supercomputing&rft.au=Liu,%20Yong&rft.date=2021-02-01&rft.volume=77&rft.issue=2&rft.spage=2103&rft.epage=2123&rft.pages=2103-2123&rft.issn=0920-8542&rft.eissn=1573-0484&rft_id=info:doi/10.1007/s11227-020-03337-4&rft_dat=%3Cproquest_cross%3E2480787498%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2480787498&rft_id=info:pmid/&rfr_iscdi=true