RecFlow: SDN-based receiver-driven flow scheduling in datacenters
Datacenter applications (e.g., web search, recommendation systems, and social networking) are designed to have a high fanout for the purpose of achieving scalable performance. Frequent fabric congestion (e.g., due to incast, imperfect hashing) is a corollary of such a design. This is true even when...
Gespeichert in:
Veröffentlicht in: | Cluster computing 2020-03, Vol.23 (1), p.289-306 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Datacenter applications (e.g., web search, recommendation systems, and social networking) are designed to have a high fanout for the purpose of achieving scalable performance. Frequent fabric congestion (e.g., due to incast, imperfect hashing) is a corollary of such a design. This is true even when the network utilization is low. Such fabric congestion exhibits both temporal as well as spatial (intra-rack and inter-rack) variations. There exist two basic design paradigms which are used to address this issue. Current solutions lie somewhere between the two. On one hand we have arbiter based approaches where senders poll a centralized arbiter and collectively obey global scheduling decisions. On the other end of the spectrum, we have self adjusting end point based approaches where senders independently adjust transmission rate based on network congestion. The former incurs greater overhead, compared to the latter which trades off complexity for sub-optimality. Our work seeks a middle ground - optimality of arbiter based approaches with the simplicity of self adjusting end point based approaches. Our key design principle is that since the receiver has complete information regarding the flows destined for it, rather than having a centralized arbiter schedule flows or the senders making independent scheduling decisions, the receiver can orchestrate the various flows destined for it. Since multiple receivers may be using a bottleneck link, datapath visibility should be used to ensure fair sharing of the bottleneck capacity between receivers with minimum overhead. We propose RecFlow, which is a receiver-based proactive congestion control scheme. RecFlow employs OpenFlow provided path visibility to track changing bottlenecks on the fly. It spaces TCP acknowledgements to prevent traffic bursts and ensure that no receiver exceeds its fair share of the bottleneck capacity. The goal is to reduce buffer overflows while maintaining fairness among flows and high link utilization. Using extensive simulation results and real testbed evaluation, we show that compared to the state-of-the-art, RecFlow achieves up to 6× improvement in the inter-rack scenario and 1.5× in the intra-rack scenario while sharing the link capacity fairly between all flows. |
---|---|
ISSN: | 1386-7857 1573-7543 |
DOI: | 10.1007/s10586-019-02922-4 |