An adaptive failure recovery mechanism based on asymmetric routing for data center networks
As the infrastructure of high-performance computing, the data center network plays an important role. As network failures occur frequently, data center networks demand highly performed, robust, and energy-efficient failure recovery mechanisms. Despite process, the existing work still has a huge scop...
Gespeichert in:
Veröffentlicht in: | The Journal of supercomputing 2021-02, Vol.77 (2), p.2103-2123 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2123 |
---|---|
container_issue | 2 |
container_start_page | 2103 |
container_title | The Journal of supercomputing |
container_volume | 77 |
creator | Liu, Yong Gu, Huaxi Wang, Kun Yu, Xiaoshan Wang, Yunhao |
description | As the infrastructure of high-performance computing, the data center network plays an important role. As network failures occur frequently, data center networks demand highly performed, robust, and energy-efficient failure recovery mechanisms. Despite process, the existing work still has a huge scope to improve to satisfy these requirements. The backup-based failure recovery schemes reserve backup paths in advance, which results in a large energy consumption under normal network conditions. In order to solve the energy consumption problem, the existing adaptive failure recovery schemes are proposed to calculate the rerouting path of the traffic on the failed link, which reduces the energy consumption. However, most adaptive fault recovery solutions apply multi-path routing to calculate the re-routing path. As multi-path routing cannot detect the congestion status of the path under the asymmetric topology caused by link failures, the network is congested, which ends up in less robustness of the network. In view of this, we design and evaluate AFRM, a novel adaptive failure recovery mechanism that overcomes these challenges. AFRM uses asymmetrical routing to calculate the re-routing path by being congestion-aware and is more robust to topological asymmetries compared with existing schemes. The asymmetrical routing dynamically schedules flows to the path with the least marginal cost, which makes AFRM much more energy-efficient. Additionally, AFRM achieves fast link failure detection based on hash storage and flow table matching. Evaluations show that AFRM can do the trade-off between failure recovery time and energy consumption, reduce flow completion time, and increase network throughput compared with existing schemes. |
doi_str_mv | 10.1007/s11227-020-03337-4 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2480787498</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2480787498</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-e734b852f7e0dd56255be0d94444868172a3ed12cfaf475699457f4827c0edbb3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPA82o-m-yxFL-g4EVPHkI2O6lbu5uaZCv990YreHMuMzDPOwMPQpeUXFNC1E2ilDFVEUYqwjlXlThCEyoVr4jQ4hhNSF1WWgp2is5SWhNCBFd8gl7nA7at3eZuB9jbbjNGwBFc2EHc4x7cmx261OPGJmhxKHDa9z3k2Dkcw5i7YYV9iLi12WIHQ4aIB8ifIb6nc3Ti7SbBxW-fope72-fFQ7V8un9czJeV47TOFSguGi2ZV0DaVs6YlE2ZalFKzzRVzHJoKXPeeqHkrK6FVF5ophyBtmn4FF0d7m5j-BghZbMOYxzKS8OEJkorUetCsQPlYkgpgjfb2PU27g0l5luiOUg0RaL5kWhECfFDKBV4WEH8O_1P6gsvsXUp</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2480787498</pqid></control><display><type>article</type><title>An adaptive failure recovery mechanism based on asymmetric routing for data center networks</title><source>SpringerLink Journals - AutoHoldings</source><creator>Liu, Yong ; Gu, Huaxi ; Wang, Kun ; Yu, Xiaoshan ; Wang, Yunhao</creator><creatorcontrib>Liu, Yong ; Gu, Huaxi ; Wang, Kun ; Yu, Xiaoshan ; Wang, Yunhao</creatorcontrib><description>As the infrastructure of high-performance computing, the data center network plays an important role. As network failures occur frequently, data center networks demand highly performed, robust, and energy-efficient failure recovery mechanisms. Despite process, the existing work still has a huge scope to improve to satisfy these requirements. The backup-based failure recovery schemes reserve backup paths in advance, which results in a large energy consumption under normal network conditions. In order to solve the energy consumption problem, the existing adaptive failure recovery schemes are proposed to calculate the rerouting path of the traffic on the failed link, which reduces the energy consumption. However, most adaptive fault recovery solutions apply multi-path routing to calculate the re-routing path. As multi-path routing cannot detect the congestion status of the path under the asymmetric topology caused by link failures, the network is congested, which ends up in less robustness of the network. In view of this, we design and evaluate AFRM, a novel adaptive failure recovery mechanism that overcomes these challenges. AFRM uses asymmetrical routing to calculate the re-routing path by being congestion-aware and is more robust to topological asymmetries compared with existing schemes. The asymmetrical routing dynamically schedules flows to the path with the least marginal cost, which makes AFRM much more energy-efficient. Additionally, AFRM achieves fast link failure detection based on hash storage and flow table matching. Evaluations show that AFRM can do the trade-off between failure recovery time and energy consumption, reduce flow completion time, and increase network throughput compared with existing schemes.</description><identifier>ISSN: 0920-8542</identifier><identifier>EISSN: 1573-0484</identifier><identifier>DOI: 10.1007/s11227-020-03337-4</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Asymmetry ; Compilers ; Completion time ; Computer centers ; Computer Science ; Data centers ; Energy consumption ; Energy recovery ; Failure ; Failure detection ; Interpreters ; Processor Architectures ; Programming Languages ; Recovery time ; Rerouteing ; Robustness ; Route planning ; Schedules ; Topology</subject><ispartof>The Journal of supercomputing, 2021-02, Vol.77 (2), p.2103-2123</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-e734b852f7e0dd56255be0d94444868172a3ed12cfaf475699457f4827c0edbb3</citedby><cites>FETCH-LOGICAL-c319t-e734b852f7e0dd56255be0d94444868172a3ed12cfaf475699457f4827c0edbb3</cites><orcidid>0000-0002-6409-2229</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11227-020-03337-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11227-020-03337-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Liu, Yong</creatorcontrib><creatorcontrib>Gu, Huaxi</creatorcontrib><creatorcontrib>Wang, Kun</creatorcontrib><creatorcontrib>Yu, Xiaoshan</creatorcontrib><creatorcontrib>Wang, Yunhao</creatorcontrib><title>An adaptive failure recovery mechanism based on asymmetric routing for data center networks</title><title>The Journal of supercomputing</title><addtitle>J Supercomput</addtitle><description>As the infrastructure of high-performance computing, the data center network plays an important role. As network failures occur frequently, data center networks demand highly performed, robust, and energy-efficient failure recovery mechanisms. Despite process, the existing work still has a huge scope to improve to satisfy these requirements. The backup-based failure recovery schemes reserve backup paths in advance, which results in a large energy consumption under normal network conditions. In order to solve the energy consumption problem, the existing adaptive failure recovery schemes are proposed to calculate the rerouting path of the traffic on the failed link, which reduces the energy consumption. However, most adaptive fault recovery solutions apply multi-path routing to calculate the re-routing path. As multi-path routing cannot detect the congestion status of the path under the asymmetric topology caused by link failures, the network is congested, which ends up in less robustness of the network. In view of this, we design and evaluate AFRM, a novel adaptive failure recovery mechanism that overcomes these challenges. AFRM uses asymmetrical routing to calculate the re-routing path by being congestion-aware and is more robust to topological asymmetries compared with existing schemes. The asymmetrical routing dynamically schedules flows to the path with the least marginal cost, which makes AFRM much more energy-efficient. Additionally, AFRM achieves fast link failure detection based on hash storage and flow table matching. Evaluations show that AFRM can do the trade-off between failure recovery time and energy consumption, reduce flow completion time, and increase network throughput compared with existing schemes.</description><subject>Asymmetry</subject><subject>Compilers</subject><subject>Completion time</subject><subject>Computer centers</subject><subject>Computer Science</subject><subject>Data centers</subject><subject>Energy consumption</subject><subject>Energy recovery</subject><subject>Failure</subject><subject>Failure detection</subject><subject>Interpreters</subject><subject>Processor Architectures</subject><subject>Programming Languages</subject><subject>Recovery time</subject><subject>Rerouteing</subject><subject>Robustness</subject><subject>Route planning</subject><subject>Schedules</subject><subject>Topology</subject><issn>0920-8542</issn><issn>1573-0484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFPA82o-m-yxFL-g4EVPHkI2O6lbu5uaZCv990YreHMuMzDPOwMPQpeUXFNC1E2ilDFVEUYqwjlXlThCEyoVr4jQ4hhNSF1WWgp2is5SWhNCBFd8gl7nA7at3eZuB9jbbjNGwBFc2EHc4x7cmx261OPGJmhxKHDa9z3k2Dkcw5i7YYV9iLi12WIHQ4aIB8ifIb6nc3Ti7SbBxW-fope72-fFQ7V8un9czJeV47TOFSguGi2ZV0DaVs6YlE2ZalFKzzRVzHJoKXPeeqHkrK6FVF5ophyBtmn4FF0d7m5j-BghZbMOYxzKS8OEJkorUetCsQPlYkgpgjfb2PU27g0l5luiOUg0RaL5kWhECfFDKBV4WEH8O_1P6gsvsXUp</recordid><startdate>20210201</startdate><enddate>20210201</enddate><creator>Liu, Yong</creator><creator>Gu, Huaxi</creator><creator>Wang, Kun</creator><creator>Yu, Xiaoshan</creator><creator>Wang, Yunhao</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-6409-2229</orcidid></search><sort><creationdate>20210201</creationdate><title>An adaptive failure recovery mechanism based on asymmetric routing for data center networks</title><author>Liu, Yong ; Gu, Huaxi ; Wang, Kun ; Yu, Xiaoshan ; Wang, Yunhao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-e734b852f7e0dd56255be0d94444868172a3ed12cfaf475699457f4827c0edbb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Asymmetry</topic><topic>Compilers</topic><topic>Completion time</topic><topic>Computer centers</topic><topic>Computer Science</topic><topic>Data centers</topic><topic>Energy consumption</topic><topic>Energy recovery</topic><topic>Failure</topic><topic>Failure detection</topic><topic>Interpreters</topic><topic>Processor Architectures</topic><topic>Programming Languages</topic><topic>Recovery time</topic><topic>Rerouteing</topic><topic>Robustness</topic><topic>Route planning</topic><topic>Schedules</topic><topic>Topology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Yong</creatorcontrib><creatorcontrib>Gu, Huaxi</creatorcontrib><creatorcontrib>Wang, Kun</creatorcontrib><creatorcontrib>Yu, Xiaoshan</creatorcontrib><creatorcontrib>Wang, Yunhao</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of supercomputing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Yong</au><au>Gu, Huaxi</au><au>Wang, Kun</au><au>Yu, Xiaoshan</au><au>Wang, Yunhao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An adaptive failure recovery mechanism based on asymmetric routing for data center networks</atitle><jtitle>The Journal of supercomputing</jtitle><stitle>J Supercomput</stitle><date>2021-02-01</date><risdate>2021</risdate><volume>77</volume><issue>2</issue><spage>2103</spage><epage>2123</epage><pages>2103-2123</pages><issn>0920-8542</issn><eissn>1573-0484</eissn><abstract>As the infrastructure of high-performance computing, the data center network plays an important role. As network failures occur frequently, data center networks demand highly performed, robust, and energy-efficient failure recovery mechanisms. Despite process, the existing work still has a huge scope to improve to satisfy these requirements. The backup-based failure recovery schemes reserve backup paths in advance, which results in a large energy consumption under normal network conditions. In order to solve the energy consumption problem, the existing adaptive failure recovery schemes are proposed to calculate the rerouting path of the traffic on the failed link, which reduces the energy consumption. However, most adaptive fault recovery solutions apply multi-path routing to calculate the re-routing path. As multi-path routing cannot detect the congestion status of the path under the asymmetric topology caused by link failures, the network is congested, which ends up in less robustness of the network. In view of this, we design and evaluate AFRM, a novel adaptive failure recovery mechanism that overcomes these challenges. AFRM uses asymmetrical routing to calculate the re-routing path by being congestion-aware and is more robust to topological asymmetries compared with existing schemes. The asymmetrical routing dynamically schedules flows to the path with the least marginal cost, which makes AFRM much more energy-efficient. Additionally, AFRM achieves fast link failure detection based on hash storage and flow table matching. Evaluations show that AFRM can do the trade-off between failure recovery time and energy consumption, reduce flow completion time, and increase network throughput compared with existing schemes.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11227-020-03337-4</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0002-6409-2229</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0920-8542 |
ispartof | The Journal of supercomputing, 2021-02, Vol.77 (2), p.2103-2123 |
issn | 0920-8542 1573-0484 |
language | eng |
recordid | cdi_proquest_journals_2480787498 |
source | SpringerLink Journals - AutoHoldings |
subjects | Asymmetry Compilers Completion time Computer centers Computer Science Data centers Energy consumption Energy recovery Failure Failure detection Interpreters Processor Architectures Programming Languages Recovery time Rerouteing Robustness Route planning Schedules Topology |
title | An adaptive failure recovery mechanism based on asymmetric routing for data center networks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T06%3A29%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20adaptive%20failure%20recovery%20mechanism%20based%20on%20asymmetric%20routing%20for%20data%20center%20networks&rft.jtitle=The%20Journal%20of%20supercomputing&rft.au=Liu,%20Yong&rft.date=2021-02-01&rft.volume=77&rft.issue=2&rft.spage=2103&rft.epage=2123&rft.pages=2103-2123&rft.issn=0920-8542&rft.eissn=1573-0484&rft_id=info:doi/10.1007/s11227-020-03337-4&rft_dat=%3Cproquest_cross%3E2480787498%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2480787498&rft_id=info:pmid/&rfr_iscdi=true |