Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks

Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on networking 2020-02, Vol.28 (1), p.322-335
Hauptverfasser:	Xue, Jiachen, Chaudhry, Muhammad Usama, Vamanan, Balajee, Vijaykumar, T. N., Thottethodi, Mithuna
Format:	Artikel
Sprache:	eng
Schlagworte:	Congestion congestion control Convergence Datacenters Dispersion Flow deflection Hardware IEEE transactions Iterative methods Network latency RDMA Receivers Switches Switching theory Throughput
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	335
container_issue	1
container_start_page	322
container_title	IEEE/ACM transactions on networking
container_volume	28
creator	Xue, Jiachen Chaudhry, Muhammad Usama Vamanan, Balajee Vijaykumar, T. N. Thottethodi, Mithuna
description	Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs). Several papers have shown that even in oversubscribed datacenter networks most congestion occurs at the receiver. Accordingly, we propose a divide-and-specialize approach, called Dart, which isolates the common case of receiver congestion and further subdivides the remaining in-network congestion into the simpler spatially-localized and the harder spatially-dispersed cases. For receiver congestion, we propose direct apportioning of sending rates (DASR) in which a receiver for n senders directs each sender to cut its rate by a factor of n, converging in only one RTT. For the spatially-localized case, Dart provides fast (under one RTT) response by adding novel switch hardware for in-order flow deflection (IOFD) because RDMA disallows packet reordering on which previous load balancing schemes rely. For the uncommon spatially-dispersed case, Dart falls back to DCQCN. Small-scale testbed measurements and at-scale simulations, respectively, show that Dart achieves 60% (2.5x) and 79% (4.8x) lower 99t'-percentile latency, and similar and 58% higher throughput than InfiniBand, and TIMELY and DCQCN.
doi_str_mv	10.1109/TNET.2019.2961671
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TNET_2019_2961671</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8959354</ieee_id><sourcerecordid>2358913133</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-68fd8c27829e452f5f3801e6db404f812fa9a79e217376c135d004ca0fad60453</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhhdRsFZ_gHgJeN6aj0028Va7rQq1Qq1HCXF3Iql1U5NU0V_vlhZPM4fnfWd4suyc4AEhWF0tZuPFgGKiBlQJIkpykPUI5zKnXIjDbseC5UIoepydxLjEmDBMRS97qUxI16hyX64BZNoGPa2hdmblfgFZH9DExITmENe-jYCSRyPfvkFMzrfItWhePQzzGxOhQZVJpoY2QUAzSN8-vMfT7MiaVYSz_exnz5PxYnSXTx9v70fDaV4zJlIupG1kTUtJFRScWm6ZxARE81rgwkpCrVGmVEBJyUpRE8YbjIvaYGsagQvO-tnlrncd_Oem-04v_Sa03UlNGZeKMMJYR5EdVQcfYwCr18F9mPCjCdZbi3prUW8t6r3FLnOxyzgA-Oel4orxgv0B83Jsiw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2358913133</pqid></control><display><type>article</type><title>Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks</title><source>IEEE Electronic Library (IEL)</source><creator>Xue, Jiachen ; Chaudhry, Muhammad Usama ; Vamanan, Balajee ; Vijaykumar, T. N. ; Thottethodi, Mithuna</creator><creatorcontrib>Xue, Jiachen ; Chaudhry, Muhammad Usama ; Vamanan, Balajee ; Vijaykumar, T. N. ; Thottethodi, Mithuna</creatorcontrib><description>Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs). Several papers have shown that even in oversubscribed datacenter networks most congestion occurs at the receiver. Accordingly, we propose a divide-and-specialize approach, called Dart, which isolates the common case of receiver congestion and further subdivides the remaining in-network congestion into the simpler spatially-localized and the harder spatially-dispersed cases. For receiver congestion, we propose direct apportioning of sending rates (DASR) in which a receiver for n senders directs each sender to cut its rate by a factor of n, converging in only one RTT. For the spatially-localized case, Dart provides fast (under one RTT) response by adding novel switch hardware for in-order flow deflection (IOFD) because RDMA disallows packet reordering on which previous load balancing schemes rely. For the uncommon spatially-dispersed case, Dart falls back to DCQCN. Small-scale testbed measurements and at-scale simulations, respectively, show that Dart achieves 60% (2.5x) and 79% (4.8x) lower 99t'-percentile latency, and similar and 58% higher throughput than InfiniBand, and TIMELY and DCQCN.</description><identifier>ISSN: 1063-6692</identifier><identifier>EISSN: 1558-2566</identifier><identifier>DOI: 10.1109/TNET.2019.2961671</identifier><identifier>CODEN: IEANEP</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Congestion ; congestion control ; Convergence ; Datacenters ; Dispersion ; Flow deflection ; Hardware ; IEEE transactions ; Iterative methods ; Network latency ; RDMA ; Receivers ; Switches ; Switching theory ; Throughput</subject><ispartof>IEEE/ACM transactions on networking, 2020-02, Vol.28 (1), p.322-335</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-68fd8c27829e452f5f3801e6db404f812fa9a79e217376c135d004ca0fad60453</citedby><cites>FETCH-LOGICAL-c336t-68fd8c27829e452f5f3801e6db404f812fa9a79e217376c135d004ca0fad60453</cites><orcidid>0000-0002-7581-6624</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8959354$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8959354$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Xue, Jiachen</creatorcontrib><creatorcontrib>Chaudhry, Muhammad Usama</creatorcontrib><creatorcontrib>Vamanan, Balajee</creatorcontrib><creatorcontrib>Vijaykumar, T. N.</creatorcontrib><creatorcontrib>Thottethodi, Mithuna</creatorcontrib><title>Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks</title><title>IEEE/ACM transactions on networking</title><addtitle>TNET</addtitle><description>Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs). Several papers have shown that even in oversubscribed datacenter networks most congestion occurs at the receiver. Accordingly, we propose a divide-and-specialize approach, called Dart, which isolates the common case of receiver congestion and further subdivides the remaining in-network congestion into the simpler spatially-localized and the harder spatially-dispersed cases. For receiver congestion, we propose direct apportioning of sending rates (DASR) in which a receiver for n senders directs each sender to cut its rate by a factor of n, converging in only one RTT. For the spatially-localized case, Dart provides fast (under one RTT) response by adding novel switch hardware for in-order flow deflection (IOFD) because RDMA disallows packet reordering on which previous load balancing schemes rely. For the uncommon spatially-dispersed case, Dart falls back to DCQCN. Small-scale testbed measurements and at-scale simulations, respectively, show that Dart achieves 60% (2.5x) and 79% (4.8x) lower 99t'-percentile latency, and similar and 58% higher throughput than InfiniBand, and TIMELY and DCQCN.</description><subject>Congestion</subject><subject>congestion control</subject><subject>Convergence</subject><subject>Datacenters</subject><subject>Dispersion</subject><subject>Flow deflection</subject><subject>Hardware</subject><subject>IEEE transactions</subject><subject>Iterative methods</subject><subject>Network latency</subject><subject>RDMA</subject><subject>Receivers</subject><subject>Switches</subject><subject>Switching theory</subject><subject>Throughput</subject><issn>1063-6692</issn><issn>1558-2566</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhhdRsFZ_gHgJeN6aj0028Va7rQq1Qq1HCXF3Iql1U5NU0V_vlhZPM4fnfWd4suyc4AEhWF0tZuPFgGKiBlQJIkpykPUI5zKnXIjDbseC5UIoepydxLjEmDBMRS97qUxI16hyX64BZNoGPa2hdmblfgFZH9DExITmENe-jYCSRyPfvkFMzrfItWhePQzzGxOhQZVJpoY2QUAzSN8-vMfT7MiaVYSz_exnz5PxYnSXTx9v70fDaV4zJlIupG1kTUtJFRScWm6ZxARE81rgwkpCrVGmVEBJyUpRE8YbjIvaYGsagQvO-tnlrncd_Oem-04v_Sa03UlNGZeKMMJYR5EdVQcfYwCr18F9mPCjCdZbi3prUW8t6r3FLnOxyzgA-Oel4orxgv0B83Jsiw</recordid><startdate>202002</startdate><enddate>202002</enddate><creator>Xue, Jiachen</creator><creator>Chaudhry, Muhammad Usama</creator><creator>Vamanan, Balajee</creator><creator>Vijaykumar, T. N.</creator><creator>Thottethodi, Mithuna</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-7581-6624</orcidid></search><sort><creationdate>202002</creationdate><title>Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks</title><author>Xue, Jiachen ; Chaudhry, Muhammad Usama ; Vamanan, Balajee ; Vijaykumar, T. N. ; Thottethodi, Mithuna</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-68fd8c27829e452f5f3801e6db404f812fa9a79e217376c135d004ca0fad60453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Congestion</topic><topic>congestion control</topic><topic>Convergence</topic><topic>Datacenters</topic><topic>Dispersion</topic><topic>Flow deflection</topic><topic>Hardware</topic><topic>IEEE transactions</topic><topic>Iterative methods</topic><topic>Network latency</topic><topic>RDMA</topic><topic>Receivers</topic><topic>Switches</topic><topic>Switching theory</topic><topic>Throughput</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xue, Jiachen</creatorcontrib><creatorcontrib>Chaudhry, Muhammad Usama</creatorcontrib><creatorcontrib>Vamanan, Balajee</creatorcontrib><creatorcontrib>Vijaykumar, T. N.</creatorcontrib><creatorcontrib>Thottethodi, Mithuna</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on networking</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xue, Jiachen</au><au>Chaudhry, Muhammad Usama</au><au>Vamanan, Balajee</au><au>Vijaykumar, T. N.</au><au>Thottethodi, Mithuna</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks</atitle><jtitle>IEEE/ACM transactions on networking</jtitle><stitle>TNET</stitle><date>2020-02</date><risdate>2020</risdate><volume>28</volume><issue>1</issue><spage>322</spage><epage>335</epage><pages>322-335</pages><issn>1063-6692</issn><eissn>1558-2566</eissn><coden>IEANEP</coden><abstract>Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs). Several papers have shown that even in oversubscribed datacenter networks most congestion occurs at the receiver. Accordingly, we propose a divide-and-specialize approach, called Dart, which isolates the common case of receiver congestion and further subdivides the remaining in-network congestion into the simpler spatially-localized and the harder spatially-dispersed cases. For receiver congestion, we propose direct apportioning of sending rates (DASR) in which a receiver for n senders directs each sender to cut its rate by a factor of n, converging in only one RTT. For the spatially-localized case, Dart provides fast (under one RTT) response by adding novel switch hardware for in-order flow deflection (IOFD) because RDMA disallows packet reordering on which previous load balancing schemes rely. For the uncommon spatially-dispersed case, Dart falls back to DCQCN. Small-scale testbed measurements and at-scale simulations, respectively, show that Dart achieves 60% (2.5x) and 79% (4.8x) lower 99t'-percentile latency, and similar and 58% higher throughput than InfiniBand, and TIMELY and DCQCN.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TNET.2019.2961671</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-7581-6624</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1063-6692
ispartof	IEEE/ACM transactions on networking, 2020-02, Vol.28 (1), p.322-335
issn	1063-6692 1558-2566
language	eng
recordid	cdi_crossref_primary_10_1109_TNET_2019_2961671
source	IEEE Electronic Library (IEL)
subjects	Congestion congestion control Convergence Datacenters Dispersion Flow deflection Hardware IEEE transactions Iterative methods Network latency RDMA Receivers Switches Switching theory Throughput
title	Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T20%3A22%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dart:%20Divide%20and%20Specialize%20for%20Fast%20Response%20to%20Congestion%20in%20RDMA-Based%20Datacenter%20Networks&rft.jtitle=IEEE/ACM%20transactions%20on%20networking&rft.au=Xue,%20Jiachen&rft.date=2020-02&rft.volume=28&rft.issue=1&rft.spage=322&rft.epage=335&rft.pages=322-335&rft.issn=1063-6692&rft.eissn=1558-2566&rft.coden=IEANEP&rft_id=info:doi/10.1109/TNET.2019.2961671&rft_dat=%3Cproquest_RIE%3E2358913133%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2358913133&rft_id=info:pmid/&rft_ieee_id=8959354&rfr_iscdi=true