RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)

Low-latency online services have strict Service Level Objectives (SLOs) that require datacenter systems to support high throughput at microsecond-scale tail latency. Dataplane operating systems have been designed to scale up multi-core servers with minimal overhead for such SLOs. However, as applica...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhu, Hang, Kaffes, Kostis, Chen, Zixu, Liu, Zhenming, Kozyrakis, Christos, Stoica, Ion, Jin, Xin
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Distributed, Parallel, and Cluster Computing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhu, Hang Kaffes, Kostis Chen, Zixu Liu, Zhenming Kozyrakis, Christos Stoica, Ion Jin, Xin
description	Low-latency online services have strict Service Level Objectives (SLOs) that require datacenter systems to support high throughput at microsecond-scale tail latency. Dataplane operating systems have been designed to scale up multi-core servers with minimal overhead for such SLOs. However, as application demands continue to increase, scaling up is not enough, and serving larger demands requires these systems to scale out to multiple servers in a rack. We present RackSched, the first rack-level microsecond-scale scheduler that provides the abstraction of a rack-scale computer (i.e., a huge server with hundreds to thousands of cores) to an external service with network-system co-design. The core of RackSched is a two-layer scheduling framework that integrates inter-server scheduling in the top-of-rack (ToR) switch with intra-server scheduling in each server. We use a combination of analytical results and simulations to show that it provides near-optimal performance as centralized scheduling policies, and is robust for both low-dispersion and high-dispersion workloads. We design a custom switch data plane for the inter-server scheduler, which realizes power-of-k-choices, ensures request affinity, and tracks server loads accurately and efficiently. We implement a RackSched prototype on a cluster of commodity servers connected by a Barefoot Tofino switch. End-to-end experiments on a twelve-server testbed show that RackSched improves the throughput by up to 1.44x, and scales out the throughput near linearly, while maintaining the same tail latency as one server until the system is saturated.
doi_str_mv	10.48550/arxiv.2010.05969
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2010_05969</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2010_05969</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-e9411f30cb81a9af355610ac32599f984bf886fc51c5d247bafa5262166d24973</originalsourceid><addsrcrecordid>eNotjztvwjAUhb10qGh_QKd6hCHUj9ix2VDUQiWqSpA9urmxRUTAkQNV--8bAtPReehIHyEvnM1ToxR7g_jb_MwFGwKmrLaPpNgCHna4d_WCLulXgzH0DsOpTnYIraNjdWldpD5Eeh3fizwcu8vZxZ5OC4f7UzOkdOu6EM-zJ_Lgoe3d810npPh4L_J1svlefebLTQI6s4mzKedeMqwMBwteKqU5A5RCWeutSStvjPaoOKpapFkFHpTQgms9WJvJCXm93Y5YZRebI8S_8opXjnjyH7LZSWM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)</title><source>arXiv.org</source><creator>Zhu, Hang ; Kaffes, Kostis ; Chen, Zixu ; Liu, Zhenming ; Kozyrakis, Christos ; Stoica, Ion ; Jin, Xin</creator><creatorcontrib>Zhu, Hang ; Kaffes, Kostis ; Chen, Zixu ; Liu, Zhenming ; Kozyrakis, Christos ; Stoica, Ion ; Jin, Xin</creatorcontrib><description>Low-latency online services have strict Service Level Objectives (SLOs) that require datacenter systems to support high throughput at microsecond-scale tail latency. Dataplane operating systems have been designed to scale up multi-core servers with minimal overhead for such SLOs. However, as application demands continue to increase, scaling up is not enough, and serving larger demands requires these systems to scale out to multiple servers in a rack. We present RackSched, the first rack-level microsecond-scale scheduler that provides the abstraction of a rack-scale computer (i.e., a huge server with hundreds to thousands of cores) to an external service with network-system co-design. The core of RackSched is a two-layer scheduling framework that integrates inter-server scheduling in the top-of-rack (ToR) switch with intra-server scheduling in each server. We use a combination of analytical results and simulations to show that it provides near-optimal performance as centralized scheduling policies, and is robust for both low-dispersion and high-dispersion workloads. We design a custom switch data plane for the inter-server scheduler, which realizes power-of-k-choices, ensures request affinity, and tracks server loads accurately and efficiently. We implement a RackSched prototype on a cluster of commodity servers connected by a Barefoot Tofino switch. End-to-end experiments on a twelve-server testbed show that RackSched improves the throughput by up to 1.44x, and scales out the throughput near linearly, while maintaining the same tail latency as one server until the system is saturated.</description><identifier>DOI: 10.48550/arxiv.2010.05969</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><creationdate>2020-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2010.05969$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.05969$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhu, Hang</creatorcontrib><creatorcontrib>Kaffes, Kostis</creatorcontrib><creatorcontrib>Chen, Zixu</creatorcontrib><creatorcontrib>Liu, Zhenming</creatorcontrib><creatorcontrib>Kozyrakis, Christos</creatorcontrib><creatorcontrib>Stoica, Ion</creatorcontrib><creatorcontrib>Jin, Xin</creatorcontrib><title>RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)</title><description>Low-latency online services have strict Service Level Objectives (SLOs) that require datacenter systems to support high throughput at microsecond-scale tail latency. Dataplane operating systems have been designed to scale up multi-core servers with minimal overhead for such SLOs. However, as application demands continue to increase, scaling up is not enough, and serving larger demands requires these systems to scale out to multiple servers in a rack. We present RackSched, the first rack-level microsecond-scale scheduler that provides the abstraction of a rack-scale computer (i.e., a huge server with hundreds to thousands of cores) to an external service with network-system co-design. The core of RackSched is a two-layer scheduling framework that integrates inter-server scheduling in the top-of-rack (ToR) switch with intra-server scheduling in each server. We use a combination of analytical results and simulations to show that it provides near-optimal performance as centralized scheduling policies, and is robust for both low-dispersion and high-dispersion workloads. We design a custom switch data plane for the inter-server scheduler, which realizes power-of-k-choices, ensures request affinity, and tracks server loads accurately and efficiently. We implement a RackSched prototype on a cluster of commodity servers connected by a Barefoot Tofino switch. End-to-end experiments on a twelve-server testbed show that RackSched improves the throughput by up to 1.44x, and scales out the throughput near linearly, while maintaining the same tail latency as one server until the system is saturated.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotjztvwjAUhb10qGh_QKd6hCHUj9ix2VDUQiWqSpA9urmxRUTAkQNV--8bAtPReehIHyEvnM1ToxR7g_jb_MwFGwKmrLaPpNgCHna4d_WCLulXgzH0DsOpTnYIraNjdWldpD5Eeh3fizwcu8vZxZ5OC4f7UzOkdOu6EM-zJ_Lgoe3d810npPh4L_J1svlefebLTQI6s4mzKedeMqwMBwteKqU5A5RCWeutSStvjPaoOKpapFkFHpTQgms9WJvJCXm93Y5YZRebI8S_8opXjnjyH7LZSWM</recordid><startdate>20201012</startdate><enddate>20201012</enddate><creator>Zhu, Hang</creator><creator>Kaffes, Kostis</creator><creator>Chen, Zixu</creator><creator>Liu, Zhenming</creator><creator>Kozyrakis, Christos</creator><creator>Stoica, Ion</creator><creator>Jin, Xin</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201012</creationdate><title>RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)</title><author>Zhu, Hang ; Kaffes, Kostis ; Chen, Zixu ; Liu, Zhenming ; Kozyrakis, Christos ; Stoica, Ion ; Jin, Xin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-e9411f30cb81a9af355610ac32599f984bf886fc51c5d247bafa5262166d24973</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Hang</creatorcontrib><creatorcontrib>Kaffes, Kostis</creatorcontrib><creatorcontrib>Chen, Zixu</creatorcontrib><creatorcontrib>Liu, Zhenming</creatorcontrib><creatorcontrib>Kozyrakis, Christos</creatorcontrib><creatorcontrib>Stoica, Ion</creatorcontrib><creatorcontrib>Jin, Xin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhu, Hang</au><au>Kaffes, Kostis</au><au>Chen, Zixu</au><au>Liu, Zhenming</au><au>Kozyrakis, Christos</au><au>Stoica, Ion</au><au>Jin, Xin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)</atitle><date>2020-10-12</date><risdate>2020</risdate><abstract>Low-latency online services have strict Service Level Objectives (SLOs) that require datacenter systems to support high throughput at microsecond-scale tail latency. Dataplane operating systems have been designed to scale up multi-core servers with minimal overhead for such SLOs. However, as application demands continue to increase, scaling up is not enough, and serving larger demands requires these systems to scale out to multiple servers in a rack. We present RackSched, the first rack-level microsecond-scale scheduler that provides the abstraction of a rack-scale computer (i.e., a huge server with hundreds to thousands of cores) to an external service with network-system co-design. The core of RackSched is a two-layer scheduling framework that integrates inter-server scheduling in the top-of-rack (ToR) switch with intra-server scheduling in each server. We use a combination of analytical results and simulations to show that it provides near-optimal performance as centralized scheduling policies, and is robust for both low-dispersion and high-dispersion workloads. We design a custom switch data plane for the inter-server scheduler, which realizes power-of-k-choices, ensures request affinity, and tracks server loads accurately and efficiently. We implement a RackSched prototype on a cluster of commodity servers connected by a Barefoot Tofino switch. End-to-end experiments on a twelve-server testbed show that RackSched improves the throughput by up to 1.44x, and scales out the throughput near linearly, while maintaining the same tail latency as one server until the system is saturated.</abstract><doi>10.48550/arxiv.2010.05969</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2010.05969
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2010_05969
source	arXiv.org
subjects	Computer Science - Distributed, Parallel, and Cluster Computing
title	RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T20%3A08%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RackSched:%20A%20Microsecond-Scale%20Scheduler%20for%20Rack-Scale%20Computers%20(Technical%20Report)&rft.au=Zhu,%20Hang&rft.date=2020-10-12&rft_id=info:doi/10.48550/arxiv.2010.05969&rft_dat=%3Carxiv_GOX%3E2010_05969%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true