A Deadlock-Free and Connectivity-Guaranteed Methodology for Achieving Fault-Tolerance in On-Chip Networks

To improve the reliability of on-chip network based systems, we design a deadlock-free routing technique that is more resilient to component failures and guarantees a higher degree of node connectivity. The routing methodology consists of three key steps. First, we determine the maximal connected su...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computers 2016-02, Vol.65 (2), p.353-366
Hauptverfasser: Pengju Ren, Xiaowei Ren, Sane, Sudhanshu, Kinsy, Michel A., Nanning Zheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 366
container_issue 2
container_start_page 353
container_title IEEE transactions on computers
container_volume 65
creator Pengju Ren
Xiaowei Ren
Sane, Sudhanshu
Kinsy, Michel A.
Nanning Zheng
description To improve the reliability of on-chip network based systems, we design a deadlock-free routing technique that is more resilient to component failures and guarantees a higher degree of node connectivity. The routing methodology consists of three key steps. First, we determine the maximal connected subgraph of the faulty network by checking whether the defective components happen to be the cut vertices and bridges of the network topology. A precise fault diagnosis mechanism is used to identify partial defective routers. Second, we construct an acyclic channel dependency graph that breaks all cycles and preserves connectivity of the maximal connected subgraph. This is done through the cycle-breaking and connectivity guaranteed (CBCG) algorithm. Finally, we introduce a fault-tolerant adaptive routing scheme that can be used with or without virtual channels for network congestion avoidance and high-throughput routing. The simulation results show both the effectiveness and robustness of the proposed approach. For an 8 × 8 2D-Mesh with 40 percent of link damage, full connectivity and deadlock freedom are still archived without disabling any faultless router in 98.18 percent of the simulations. In a 2D-Torus, the simulation percentage is even higher (99.93 percent). The hardware overhead for supporting the introduced features is minimal. An on-line implementation of CBCG using TSMC 65nm library has only 0.966 and 1.139 percent area overhead for the 8 × 8 and 16 × 16 2D-Meshes.
doi_str_mv 10.1109/TC.2015.2425887
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TC_2015_2425887</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7093169</ieee_id><sourcerecordid>3921447031</sourcerecordid><originalsourceid>FETCH-LOGICAL-c322t-c95c39d62623515f3d4ec6dd9736008bfd9ae3f4c6fa6e9ac7c5c39d9cad52793</originalsourceid><addsrcrecordid>eNpdkL1PGzEYh62qSE0pc4culli6OPjjbJ_H6EooEpQlzJax3yOGww72HSj_PUmDGJje5Xl-evUg9JPROWPUnK26OadMznnDZdvqL2jGpNTEGKm-ohmlrCVGNPQb-l7rA6VUcWpmKC7wH3BhyP6RLAsAdingLqcEfowvcdySi8kVl0aAgK9hXOeQh3y_xX0ueOHXEV5iusdLNw0jWeUBdqwHHBO-SaRbxw3-B-NrLo_1Bzrq3VDh5P0eo9vl-ar7S65uLi67xRXxgvOReCO9MEFxxYVkshehAa9CMFooStu7PhgHom-86p0C47z2_wXjXZBcG3GMfh92NyU_T1BH-xSrh2FwCfJULWuZotqwZo-efkIf8lTS7jvLtFSaNYKKHXV2oHzJtRbo7abEJ1e2llG7T29Xnd2nt-_pd8avgxEB4IPW1AimjHgDbBh_WA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1756714303</pqid></control><display><type>article</type><title>A Deadlock-Free and Connectivity-Guaranteed Methodology for Achieving Fault-Tolerance in On-Chip Networks</title><source>IEEE Electronic Library (IEL)</source><creator>Pengju Ren ; Xiaowei Ren ; Sane, Sudhanshu ; Kinsy, Michel A. ; Nanning Zheng</creator><creatorcontrib>Pengju Ren ; Xiaowei Ren ; Sane, Sudhanshu ; Kinsy, Michel A. ; Nanning Zheng</creatorcontrib><description>To improve the reliability of on-chip network based systems, we design a deadlock-free routing technique that is more resilient to component failures and guarantees a higher degree of node connectivity. The routing methodology consists of three key steps. First, we determine the maximal connected subgraph of the faulty network by checking whether the defective components happen to be the cut vertices and bridges of the network topology. A precise fault diagnosis mechanism is used to identify partial defective routers. Second, we construct an acyclic channel dependency graph that breaks all cycles and preserves connectivity of the maximal connected subgraph. This is done through the cycle-breaking and connectivity guaranteed (CBCG) algorithm. Finally, we introduce a fault-tolerant adaptive routing scheme that can be used with or without virtual channels for network congestion avoidance and high-throughput routing. The simulation results show both the effectiveness and robustness of the proposed approach. For an 8 × 8 2D-Mesh with 40 percent of link damage, full connectivity and deadlock freedom are still archived without disabling any faultless router in 98.18 percent of the simulations. In a 2D-Torus, the simulation percentage is even higher (99.93 percent). The hardware overhead for supporting the introduced features is minimal. An on-line implementation of CBCG using TSMC 65nm library has only 0.966 and 1.139 percent area overhead for the 8 × 8 and 16 × 16 2D-Meshes.</description><identifier>ISSN: 0018-9340</identifier><identifier>EISSN: 1557-9956</identifier><identifier>DOI: 10.1109/TC.2015.2425887</identifier><identifier>CODEN: ITCOB4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Bridges ; Channel Dependency Graph ; Channels ; Computer networks ; Computer simulation ; Connectivity ; Fault tolerance ; Fault tolerant systems ; Network topology ; Network-on-chip ; Networks ; Portable document format ; Queuing theory ; Reliability ; Routers ; Routing ; Routing (telecommunications) ; Routing algorithm ; System recovery ; System-on-chip</subject><ispartof>IEEE transactions on computers, 2016-02, Vol.65 (2), p.353-366</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Feb 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c322t-c95c39d62623515f3d4ec6dd9736008bfd9ae3f4c6fa6e9ac7c5c39d9cad52793</citedby><cites>FETCH-LOGICAL-c322t-c95c39d62623515f3d4ec6dd9736008bfd9ae3f4c6fa6e9ac7c5c39d9cad52793</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7093169$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7093169$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Pengju Ren</creatorcontrib><creatorcontrib>Xiaowei Ren</creatorcontrib><creatorcontrib>Sane, Sudhanshu</creatorcontrib><creatorcontrib>Kinsy, Michel A.</creatorcontrib><creatorcontrib>Nanning Zheng</creatorcontrib><title>A Deadlock-Free and Connectivity-Guaranteed Methodology for Achieving Fault-Tolerance in On-Chip Networks</title><title>IEEE transactions on computers</title><addtitle>TC</addtitle><description>To improve the reliability of on-chip network based systems, we design a deadlock-free routing technique that is more resilient to component failures and guarantees a higher degree of node connectivity. The routing methodology consists of three key steps. First, we determine the maximal connected subgraph of the faulty network by checking whether the defective components happen to be the cut vertices and bridges of the network topology. A precise fault diagnosis mechanism is used to identify partial defective routers. Second, we construct an acyclic channel dependency graph that breaks all cycles and preserves connectivity of the maximal connected subgraph. This is done through the cycle-breaking and connectivity guaranteed (CBCG) algorithm. Finally, we introduce a fault-tolerant adaptive routing scheme that can be used with or without virtual channels for network congestion avoidance and high-throughput routing. The simulation results show both the effectiveness and robustness of the proposed approach. For an 8 × 8 2D-Mesh with 40 percent of link damage, full connectivity and deadlock freedom are still archived without disabling any faultless router in 98.18 percent of the simulations. In a 2D-Torus, the simulation percentage is even higher (99.93 percent). The hardware overhead for supporting the introduced features is minimal. An on-line implementation of CBCG using TSMC 65nm library has only 0.966 and 1.139 percent area overhead for the 8 × 8 and 16 × 16 2D-Meshes.</description><subject>Bridges</subject><subject>Channel Dependency Graph</subject><subject>Channels</subject><subject>Computer networks</subject><subject>Computer simulation</subject><subject>Connectivity</subject><subject>Fault tolerance</subject><subject>Fault tolerant systems</subject><subject>Network topology</subject><subject>Network-on-chip</subject><subject>Networks</subject><subject>Portable document format</subject><subject>Queuing theory</subject><subject>Reliability</subject><subject>Routers</subject><subject>Routing</subject><subject>Routing (telecommunications)</subject><subject>Routing algorithm</subject><subject>System recovery</subject><subject>System-on-chip</subject><issn>0018-9340</issn><issn>1557-9956</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkL1PGzEYh62qSE0pc4culli6OPjjbJ_H6EooEpQlzJax3yOGww72HSj_PUmDGJje5Xl-evUg9JPROWPUnK26OadMznnDZdvqL2jGpNTEGKm-ohmlrCVGNPQb-l7rA6VUcWpmKC7wH3BhyP6RLAsAdingLqcEfowvcdySi8kVl0aAgK9hXOeQh3y_xX0ueOHXEV5iusdLNw0jWeUBdqwHHBO-SaRbxw3-B-NrLo_1Bzrq3VDh5P0eo9vl-ar7S65uLi67xRXxgvOReCO9MEFxxYVkshehAa9CMFooStu7PhgHom-86p0C47z2_wXjXZBcG3GMfh92NyU_T1BH-xSrh2FwCfJULWuZotqwZo-efkIf8lTS7jvLtFSaNYKKHXV2oHzJtRbo7abEJ1e2llG7T29Xnd2nt-_pd8avgxEB4IPW1AimjHgDbBh_WA</recordid><startdate>20160201</startdate><enddate>20160201</enddate><creator>Pengju Ren</creator><creator>Xiaowei Ren</creator><creator>Sane, Sudhanshu</creator><creator>Kinsy, Michel A.</creator><creator>Nanning Zheng</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20160201</creationdate><title>A Deadlock-Free and Connectivity-Guaranteed Methodology for Achieving Fault-Tolerance in On-Chip Networks</title><author>Pengju Ren ; Xiaowei Ren ; Sane, Sudhanshu ; Kinsy, Michel A. ; Nanning Zheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c322t-c95c39d62623515f3d4ec6dd9736008bfd9ae3f4c6fa6e9ac7c5c39d9cad52793</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Bridges</topic><topic>Channel Dependency Graph</topic><topic>Channels</topic><topic>Computer networks</topic><topic>Computer simulation</topic><topic>Connectivity</topic><topic>Fault tolerance</topic><topic>Fault tolerant systems</topic><topic>Network topology</topic><topic>Network-on-chip</topic><topic>Networks</topic><topic>Portable document format</topic><topic>Queuing theory</topic><topic>Reliability</topic><topic>Routers</topic><topic>Routing</topic><topic>Routing (telecommunications)</topic><topic>Routing algorithm</topic><topic>System recovery</topic><topic>System-on-chip</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pengju Ren</creatorcontrib><creatorcontrib>Xiaowei Ren</creatorcontrib><creatorcontrib>Sane, Sudhanshu</creatorcontrib><creatorcontrib>Kinsy, Michel A.</creatorcontrib><creatorcontrib>Nanning Zheng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on computers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pengju Ren</au><au>Xiaowei Ren</au><au>Sane, Sudhanshu</au><au>Kinsy, Michel A.</au><au>Nanning Zheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Deadlock-Free and Connectivity-Guaranteed Methodology for Achieving Fault-Tolerance in On-Chip Networks</atitle><jtitle>IEEE transactions on computers</jtitle><stitle>TC</stitle><date>2016-02-01</date><risdate>2016</risdate><volume>65</volume><issue>2</issue><spage>353</spage><epage>366</epage><pages>353-366</pages><issn>0018-9340</issn><eissn>1557-9956</eissn><coden>ITCOB4</coden><abstract>To improve the reliability of on-chip network based systems, we design a deadlock-free routing technique that is more resilient to component failures and guarantees a higher degree of node connectivity. The routing methodology consists of three key steps. First, we determine the maximal connected subgraph of the faulty network by checking whether the defective components happen to be the cut vertices and bridges of the network topology. A precise fault diagnosis mechanism is used to identify partial defective routers. Second, we construct an acyclic channel dependency graph that breaks all cycles and preserves connectivity of the maximal connected subgraph. This is done through the cycle-breaking and connectivity guaranteed (CBCG) algorithm. Finally, we introduce a fault-tolerant adaptive routing scheme that can be used with or without virtual channels for network congestion avoidance and high-throughput routing. The simulation results show both the effectiveness and robustness of the proposed approach. For an 8 × 8 2D-Mesh with 40 percent of link damage, full connectivity and deadlock freedom are still archived without disabling any faultless router in 98.18 percent of the simulations. In a 2D-Torus, the simulation percentage is even higher (99.93 percent). The hardware overhead for supporting the introduced features is minimal. An on-line implementation of CBCG using TSMC 65nm library has only 0.966 and 1.139 percent area overhead for the 8 × 8 and 16 × 16 2D-Meshes.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TC.2015.2425887</doi><tpages>14</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0018-9340
ispartof IEEE transactions on computers, 2016-02, Vol.65 (2), p.353-366
issn 0018-9340
1557-9956
language eng
recordid cdi_crossref_primary_10_1109_TC_2015_2425887
source IEEE Electronic Library (IEL)
subjects Bridges
Channel Dependency Graph
Channels
Computer networks
Computer simulation
Connectivity
Fault tolerance
Fault tolerant systems
Network topology
Network-on-chip
Networks
Portable document format
Queuing theory
Reliability
Routers
Routing
Routing (telecommunications)
Routing algorithm
System recovery
System-on-chip
title A Deadlock-Free and Connectivity-Guaranteed Methodology for Achieving Fault-Tolerance in On-Chip Networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T00%3A12%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Deadlock-Free%20and%20Connectivity-Guaranteed%20Methodology%20for%20Achieving%20Fault-Tolerance%20in%20On-Chip%20Networks&rft.jtitle=IEEE%20transactions%20on%20computers&rft.au=Pengju%20Ren&rft.date=2016-02-01&rft.volume=65&rft.issue=2&rft.spage=353&rft.epage=366&rft.pages=353-366&rft.issn=0018-9340&rft.eissn=1557-9956&rft.coden=ITCOB4&rft_id=info:doi/10.1109/TC.2015.2425887&rft_dat=%3Cproquest_RIE%3E3921447031%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1756714303&rft_id=info:pmid/&rft_ieee_id=7093169&rfr_iscdi=true