Response Time and Availability Study of RAFT Consensus in Distributed SDN Control Plane

Software defined networking (SDN) promises unprecedented flexibility and ease of network operations. While flexibility is an important factor when leveraging advantages of a new technology, critical infrastructure networks also have stringent requirements on network robustness and control plane dela...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE eTransactions on network and service management 2018-03, Vol.15 (1), p.304-318
Hauptverfasser: Sakic, Ermin, Kellerer, Wolfgang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 318
container_issue 1
container_start_page 304
container_title IEEE eTransactions on network and service management
container_volume 15
creator Sakic, Ermin
Kellerer, Wolfgang
description Software defined networking (SDN) promises unprecedented flexibility and ease of network operations. While flexibility is an important factor when leveraging advantages of a new technology, critical infrastructure networks also have stringent requirements on network robustness and control plane delays. Robustness in the SDN control plane is realized by deploying multiple distributed controllers, formed into clusters for durability and fast-failover purposes. However, the effect of the controller clustering on the total system response time is not well investigated in current literature. Hence, in this work we provide a detailed analytical study of the distributed consensus algorithm RAFT, implemented in OpenDaylight and ONOS SDN controller platforms. In those controllers, RAFT implements the data-store replication, leader election after controller failures and controller state recovery on successful repairs. To evaluate its performance, we introduce a framework for numerical analysis of various SDN cluster organizations w.r.t. their response time and availability metrics. We use Stochastic Activity Networks for modeling the RAFT operations, failure injection and cluster recovery processes, and using real-world experiments, we collect the rate parameters to provide realistic inputs for a representative cluster recovery model. We also show how a fast rejuvenation mechanism for the treatment of failures induced by software errors can minimize the total response time experienced by the controller clients, while guaranteeing a higher system availability in the long-term.
doi_str_mv 10.1109/TNSM.2017.2775061
format Article
fullrecord <record><control><sourceid>crossref_ieee_</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TNSM_2017_2775061</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8114202</ieee_id><sourcerecordid>10_1109_TNSM_2017_2775061</sourcerecordid><originalsourceid>FETCH-LOGICAL-c265t-80c68514f559cabb65614b526f5e87562c6da1d86284dffe292af6ed120bac053</originalsourceid><addsrcrecordid>eNpNkN9KwzAUxoMoOKcPIN7kBTpzTps0vRybm8KcslW8LGlzApGuHU0n7O21bIhX3wffn4sfY_cgJgAie8zX29cJCkgnmKZSKLhgI8hijBIZp5f__DW7CeFLCKkhwxH73FDYt00gnvsdcdNYPv02vjalr31_5Nv-YI-8dXwzXeR8NjSbcAjcN3zuQ9_58tCT5dv5egj7rq35e20aumVXztSB7s46Zh-Lp3z2HK3eli-z6SqqUMk-0qJSWkLipMwqU5ZKKkhKicpJ0qlUWClrwGqFOrHOEWZonCILKEpTCRmPGZx-q64NoSNX7Du_M92xAFEMZIqBTDGQKc5kfjcPp40nor--BkhQYPwDW4VfYQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Response Time and Availability Study of RAFT Consensus in Distributed SDN Control Plane</title><source>IEEE Xplore</source><creator>Sakic, Ermin ; Kellerer, Wolfgang</creator><creatorcontrib>Sakic, Ermin ; Kellerer, Wolfgang</creatorcontrib><description>Software defined networking (SDN) promises unprecedented flexibility and ease of network operations. While flexibility is an important factor when leveraging advantages of a new technology, critical infrastructure networks also have stringent requirements on network robustness and control plane delays. Robustness in the SDN control plane is realized by deploying multiple distributed controllers, formed into clusters for durability and fast-failover purposes. However, the effect of the controller clustering on the total system response time is not well investigated in current literature. Hence, in this work we provide a detailed analytical study of the distributed consensus algorithm RAFT, implemented in OpenDaylight and ONOS SDN controller platforms. In those controllers, RAFT implements the data-store replication, leader election after controller failures and controller state recovery on successful repairs. To evaluate its performance, we introduce a framework for numerical analysis of various SDN cluster organizations w.r.t. their response time and availability metrics. We use Stochastic Activity Networks for modeling the RAFT operations, failure injection and cluster recovery processes, and using real-world experiments, we collect the rate parameters to provide realistic inputs for a representative cluster recovery model. We also show how a fast rejuvenation mechanism for the treatment of failures induced by software errors can minimize the total response time experienced by the controller clients, while guaranteeing a higher system availability in the long-term.</description><identifier>ISSN: 1932-4537</identifier><identifier>EISSN: 1932-4537</identifier><identifier>DOI: 10.1109/TNSM.2017.2775061</identifier><identifier>CODEN: ITNSC4</identifier><language>eng</language><publisher>IEEE</publisher><subject>Clustering algorithms ; Control systems ; Delays ; distributed control plane ; fault tolerance ; Numerical models ; ONOS ; OpenDaylight ; Performance analysis ; RAFT ; SDN ; smart grid ; stochastic activity networks ; Stochastic processes ; strong consistency ; Synchronization ; Time factors</subject><ispartof>IEEE eTransactions on network and service management, 2018-03, Vol.15 (1), p.304-318</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c265t-80c68514f559cabb65614b526f5e87562c6da1d86284dffe292af6ed120bac053</citedby><cites>FETCH-LOGICAL-c265t-80c68514f559cabb65614b526f5e87562c6da1d86284dffe292af6ed120bac053</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8114202$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids></links><search><creatorcontrib>Sakic, Ermin</creatorcontrib><creatorcontrib>Kellerer, Wolfgang</creatorcontrib><title>Response Time and Availability Study of RAFT Consensus in Distributed SDN Control Plane</title><title>IEEE eTransactions on network and service management</title><addtitle>T-NSM</addtitle><description>Software defined networking (SDN) promises unprecedented flexibility and ease of network operations. While flexibility is an important factor when leveraging advantages of a new technology, critical infrastructure networks also have stringent requirements on network robustness and control plane delays. Robustness in the SDN control plane is realized by deploying multiple distributed controllers, formed into clusters for durability and fast-failover purposes. However, the effect of the controller clustering on the total system response time is not well investigated in current literature. Hence, in this work we provide a detailed analytical study of the distributed consensus algorithm RAFT, implemented in OpenDaylight and ONOS SDN controller platforms. In those controllers, RAFT implements the data-store replication, leader election after controller failures and controller state recovery on successful repairs. To evaluate its performance, we introduce a framework for numerical analysis of various SDN cluster organizations w.r.t. their response time and availability metrics. We use Stochastic Activity Networks for modeling the RAFT operations, failure injection and cluster recovery processes, and using real-world experiments, we collect the rate parameters to provide realistic inputs for a representative cluster recovery model. We also show how a fast rejuvenation mechanism for the treatment of failures induced by software errors can minimize the total response time experienced by the controller clients, while guaranteeing a higher system availability in the long-term.</description><subject>Clustering algorithms</subject><subject>Control systems</subject><subject>Delays</subject><subject>distributed control plane</subject><subject>fault tolerance</subject><subject>Numerical models</subject><subject>ONOS</subject><subject>OpenDaylight</subject><subject>Performance analysis</subject><subject>RAFT</subject><subject>SDN</subject><subject>smart grid</subject><subject>stochastic activity networks</subject><subject>Stochastic processes</subject><subject>strong consistency</subject><subject>Synchronization</subject><subject>Time factors</subject><issn>1932-4537</issn><issn>1932-4537</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><recordid>eNpNkN9KwzAUxoMoOKcPIN7kBTpzTps0vRybm8KcslW8LGlzApGuHU0n7O21bIhX3wffn4sfY_cgJgAie8zX29cJCkgnmKZSKLhgI8hijBIZp5f__DW7CeFLCKkhwxH73FDYt00gnvsdcdNYPv02vjalr31_5Nv-YI-8dXwzXeR8NjSbcAjcN3zuQ9_58tCT5dv5egj7rq35e20aumVXztSB7s46Zh-Lp3z2HK3eli-z6SqqUMk-0qJSWkLipMwqU5ZKKkhKicpJ0qlUWClrwGqFOrHOEWZonCILKEpTCRmPGZx-q64NoSNX7Du_M92xAFEMZIqBTDGQKc5kfjcPp40nor--BkhQYPwDW4VfYQ</recordid><startdate>201803</startdate><enddate>201803</enddate><creator>Sakic, Ermin</creator><creator>Kellerer, Wolfgang</creator><general>IEEE</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>201803</creationdate><title>Response Time and Availability Study of RAFT Consensus in Distributed SDN Control Plane</title><author>Sakic, Ermin ; Kellerer, Wolfgang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c265t-80c68514f559cabb65614b526f5e87562c6da1d86284dffe292af6ed120bac053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Clustering algorithms</topic><topic>Control systems</topic><topic>Delays</topic><topic>distributed control plane</topic><topic>fault tolerance</topic><topic>Numerical models</topic><topic>ONOS</topic><topic>OpenDaylight</topic><topic>Performance analysis</topic><topic>RAFT</topic><topic>SDN</topic><topic>smart grid</topic><topic>stochastic activity networks</topic><topic>Stochastic processes</topic><topic>strong consistency</topic><topic>Synchronization</topic><topic>Time factors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sakic, Ermin</creatorcontrib><creatorcontrib>Kellerer, Wolfgang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><jtitle>IEEE eTransactions on network and service management</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sakic, Ermin</au><au>Kellerer, Wolfgang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Response Time and Availability Study of RAFT Consensus in Distributed SDN Control Plane</atitle><jtitle>IEEE eTransactions on network and service management</jtitle><stitle>T-NSM</stitle><date>2018-03</date><risdate>2018</risdate><volume>15</volume><issue>1</issue><spage>304</spage><epage>318</epage><pages>304-318</pages><issn>1932-4537</issn><eissn>1932-4537</eissn><coden>ITNSC4</coden><abstract>Software defined networking (SDN) promises unprecedented flexibility and ease of network operations. While flexibility is an important factor when leveraging advantages of a new technology, critical infrastructure networks also have stringent requirements on network robustness and control plane delays. Robustness in the SDN control plane is realized by deploying multiple distributed controllers, formed into clusters for durability and fast-failover purposes. However, the effect of the controller clustering on the total system response time is not well investigated in current literature. Hence, in this work we provide a detailed analytical study of the distributed consensus algorithm RAFT, implemented in OpenDaylight and ONOS SDN controller platforms. In those controllers, RAFT implements the data-store replication, leader election after controller failures and controller state recovery on successful repairs. To evaluate its performance, we introduce a framework for numerical analysis of various SDN cluster organizations w.r.t. their response time and availability metrics. We use Stochastic Activity Networks for modeling the RAFT operations, failure injection and cluster recovery processes, and using real-world experiments, we collect the rate parameters to provide realistic inputs for a representative cluster recovery model. We also show how a fast rejuvenation mechanism for the treatment of failures induced by software errors can minimize the total response time experienced by the controller clients, while guaranteeing a higher system availability in the long-term.</abstract><pub>IEEE</pub><doi>10.1109/TNSM.2017.2775061</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-4537
ispartof IEEE eTransactions on network and service management, 2018-03, Vol.15 (1), p.304-318
issn 1932-4537
1932-4537
language eng
recordid cdi_crossref_primary_10_1109_TNSM_2017_2775061
source IEEE Xplore
subjects Clustering algorithms
Control systems
Delays
distributed control plane
fault tolerance
Numerical models
ONOS
OpenDaylight
Performance analysis
RAFT
SDN
smart grid
stochastic activity networks
Stochastic processes
strong consistency
Synchronization
Time factors
title Response Time and Availability Study of RAFT Consensus in Distributed SDN Control Plane
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T21%3A55%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Response%20Time%20and%20Availability%20Study%20of%20RAFT%20Consensus%20in%20Distributed%20SDN%20Control%20Plane&rft.jtitle=IEEE%20eTransactions%20on%20network%20and%20service%20management&rft.au=Sakic,%20Ermin&rft.date=2018-03&rft.volume=15&rft.issue=1&rft.spage=304&rft.epage=318&rft.pages=304-318&rft.issn=1932-4537&rft.eissn=1932-4537&rft.coden=ITNSC4&rft_id=info:doi/10.1109/TNSM.2017.2775061&rft_dat=%3Ccrossref_ieee_%3E10_1109_TNSM_2017_2775061%3C/crossref_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8114202&rfr_iscdi=true