RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)
Low-latency online services have strict Service Level Objectives (SLOs) that require datacenter systems to support high throughput at microsecond-scale tail latency. Dataplane operating systems have been designed to scale up multi-core servers with minimal overhead for such SLOs. However, as applica...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Zhu, Hang Kaffes, Kostis Chen, Zixu Liu, Zhenming Kozyrakis, Christos Stoica, Ion Jin, Xin |
description | Low-latency online services have strict Service Level Objectives (SLOs) that
require datacenter systems to support high throughput at microsecond-scale tail
latency. Dataplane operating systems have been designed to scale up multi-core
servers with minimal overhead for such SLOs. However, as application demands
continue to increase, scaling up is not enough, and serving larger demands
requires these systems to scale out to multiple servers in a rack. We present
RackSched, the first rack-level microsecond-scale scheduler that provides the
abstraction of a rack-scale computer (i.e., a huge server with hundreds to
thousands of cores) to an external service with network-system co-design. The
core of RackSched is a two-layer scheduling framework that integrates
inter-server scheduling in the top-of-rack (ToR) switch with intra-server
scheduling in each server. We use a combination of analytical results and
simulations to show that it provides near-optimal performance as centralized
scheduling policies, and is robust for both low-dispersion and high-dispersion
workloads. We design a custom switch data plane for the inter-server scheduler,
which realizes power-of-k-choices, ensures request affinity, and tracks server
loads accurately and efficiently. We implement a RackSched prototype on a
cluster of commodity servers connected by a Barefoot Tofino switch. End-to-end
experiments on a twelve-server testbed show that RackSched improves the
throughput by up to 1.44x, and scales out the throughput near linearly, while
maintaining the same tail latency as one server until the system is saturated. |
doi_str_mv | 10.48550/arxiv.2010.05969 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2010_05969</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2010_05969</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-e9411f30cb81a9af355610ac32599f984bf886fc51c5d247bafa5262166d24973</originalsourceid><addsrcrecordid>eNotjztvwjAUhb10qGh_QKd6hCHUj9ix2VDUQiWqSpA9urmxRUTAkQNV--8bAtPReehIHyEvnM1ToxR7g_jb_MwFGwKmrLaPpNgCHna4d_WCLulXgzH0DsOpTnYIraNjdWldpD5Eeh3fizwcu8vZxZ5OC4f7UzOkdOu6EM-zJ_Lgoe3d810npPh4L_J1svlefebLTQI6s4mzKedeMqwMBwteKqU5A5RCWeutSStvjPaoOKpapFkFHpTQgms9WJvJCXm93Y5YZRebI8S_8opXjnjyH7LZSWM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)</title><source>arXiv.org</source><creator>Zhu, Hang ; Kaffes, Kostis ; Chen, Zixu ; Liu, Zhenming ; Kozyrakis, Christos ; Stoica, Ion ; Jin, Xin</creator><creatorcontrib>Zhu, Hang ; Kaffes, Kostis ; Chen, Zixu ; Liu, Zhenming ; Kozyrakis, Christos ; Stoica, Ion ; Jin, Xin</creatorcontrib><description>Low-latency online services have strict Service Level Objectives (SLOs) that
require datacenter systems to support high throughput at microsecond-scale tail
latency. Dataplane operating systems have been designed to scale up multi-core
servers with minimal overhead for such SLOs. However, as application demands
continue to increase, scaling up is not enough, and serving larger demands
requires these systems to scale out to multiple servers in a rack. We present
RackSched, the first rack-level microsecond-scale scheduler that provides the
abstraction of a rack-scale computer (i.e., a huge server with hundreds to
thousands of cores) to an external service with network-system co-design. The
core of RackSched is a two-layer scheduling framework that integrates
inter-server scheduling in the top-of-rack (ToR) switch with intra-server
scheduling in each server. We use a combination of analytical results and
simulations to show that it provides near-optimal performance as centralized
scheduling policies, and is robust for both low-dispersion and high-dispersion
workloads. We design a custom switch data plane for the inter-server scheduler,
which realizes power-of-k-choices, ensures request affinity, and tracks server
loads accurately and efficiently. We implement a RackSched prototype on a
cluster of commodity servers connected by a Barefoot Tofino switch. End-to-end
experiments on a twelve-server testbed show that RackSched improves the
throughput by up to 1.44x, and scales out the throughput near linearly, while
maintaining the same tail latency as one server until the system is saturated.</description><identifier>DOI: 10.48550/arxiv.2010.05969</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><creationdate>2020-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2010.05969$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.05969$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhu, Hang</creatorcontrib><creatorcontrib>Kaffes, Kostis</creatorcontrib><creatorcontrib>Chen, Zixu</creatorcontrib><creatorcontrib>Liu, Zhenming</creatorcontrib><creatorcontrib>Kozyrakis, Christos</creatorcontrib><creatorcontrib>Stoica, Ion</creatorcontrib><creatorcontrib>Jin, Xin</creatorcontrib><title>RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)</title><description>Low-latency online services have strict Service Level Objectives (SLOs) that
require datacenter systems to support high throughput at microsecond-scale tail
latency. Dataplane operating systems have been designed to scale up multi-core
servers with minimal overhead for such SLOs. However, as application demands
continue to increase, scaling up is not enough, and serving larger demands
requires these systems to scale out to multiple servers in a rack. We present
RackSched, the first rack-level microsecond-scale scheduler that provides the
abstraction of a rack-scale computer (i.e., a huge server with hundreds to
thousands of cores) to an external service with network-system co-design. The
core of RackSched is a two-layer scheduling framework that integrates
inter-server scheduling in the top-of-rack (ToR) switch with intra-server
scheduling in each server. We use a combination of analytical results and
simulations to show that it provides near-optimal performance as centralized
scheduling policies, and is robust for both low-dispersion and high-dispersion
workloads. We design a custom switch data plane for the inter-server scheduler,
which realizes power-of-k-choices, ensures request affinity, and tracks server
loads accurately and efficiently. We implement a RackSched prototype on a
cluster of commodity servers connected by a Barefoot Tofino switch. End-to-end
experiments on a twelve-server testbed show that RackSched improves the
throughput by up to 1.44x, and scales out the throughput near linearly, while
maintaining the same tail latency as one server until the system is saturated.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotjztvwjAUhb10qGh_QKd6hCHUj9ix2VDUQiWqSpA9urmxRUTAkQNV--8bAtPReehIHyEvnM1ToxR7g_jb_MwFGwKmrLaPpNgCHna4d_WCLulXgzH0DsOpTnYIraNjdWldpD5Eeh3fizwcu8vZxZ5OC4f7UzOkdOu6EM-zJ_Lgoe3d810npPh4L_J1svlefebLTQI6s4mzKedeMqwMBwteKqU5A5RCWeutSStvjPaoOKpapFkFHpTQgms9WJvJCXm93Y5YZRebI8S_8opXjnjyH7LZSWM</recordid><startdate>20201012</startdate><enddate>20201012</enddate><creator>Zhu, Hang</creator><creator>Kaffes, Kostis</creator><creator>Chen, Zixu</creator><creator>Liu, Zhenming</creator><creator>Kozyrakis, Christos</creator><creator>Stoica, Ion</creator><creator>Jin, Xin</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201012</creationdate><title>RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)</title><author>Zhu, Hang ; Kaffes, Kostis ; Chen, Zixu ; Liu, Zhenming ; Kozyrakis, Christos ; Stoica, Ion ; Jin, Xin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-e9411f30cb81a9af355610ac32599f984bf886fc51c5d247bafa5262166d24973</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Hang</creatorcontrib><creatorcontrib>Kaffes, Kostis</creatorcontrib><creatorcontrib>Chen, Zixu</creatorcontrib><creatorcontrib>Liu, Zhenming</creatorcontrib><creatorcontrib>Kozyrakis, Christos</creatorcontrib><creatorcontrib>Stoica, Ion</creatorcontrib><creatorcontrib>Jin, Xin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhu, Hang</au><au>Kaffes, Kostis</au><au>Chen, Zixu</au><au>Liu, Zhenming</au><au>Kozyrakis, Christos</au><au>Stoica, Ion</au><au>Jin, Xin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)</atitle><date>2020-10-12</date><risdate>2020</risdate><abstract>Low-latency online services have strict Service Level Objectives (SLOs) that
require datacenter systems to support high throughput at microsecond-scale tail
latency. Dataplane operating systems have been designed to scale up multi-core
servers with minimal overhead for such SLOs. However, as application demands
continue to increase, scaling up is not enough, and serving larger demands
requires these systems to scale out to multiple servers in a rack. We present
RackSched, the first rack-level microsecond-scale scheduler that provides the
abstraction of a rack-scale computer (i.e., a huge server with hundreds to
thousands of cores) to an external service with network-system co-design. The
core of RackSched is a two-layer scheduling framework that integrates
inter-server scheduling in the top-of-rack (ToR) switch with intra-server
scheduling in each server. We use a combination of analytical results and
simulations to show that it provides near-optimal performance as centralized
scheduling policies, and is robust for both low-dispersion and high-dispersion
workloads. We design a custom switch data plane for the inter-server scheduler,
which realizes power-of-k-choices, ensures request affinity, and tracks server
loads accurately and efficiently. We implement a RackSched prototype on a
cluster of commodity servers connected by a Barefoot Tofino switch. End-to-end
experiments on a twelve-server testbed show that RackSched improves the
throughput by up to 1.44x, and scales out the throughput near linearly, while
maintaining the same tail latency as one server until the system is saturated.</abstract><doi>10.48550/arxiv.2010.05969</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2010.05969 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2010_05969 |
source | arXiv.org |
subjects | Computer Science - Distributed, Parallel, and Cluster Computing |
title | RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report) |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T20%3A08%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RackSched:%20A%20Microsecond-Scale%20Scheduler%20for%20Rack-Scale%20Computers%20(Technical%20Report)&rft.au=Zhu,%20Hang&rft.date=2020-10-12&rft_id=info:doi/10.48550/arxiv.2010.05969&rft_dat=%3Carxiv_GOX%3E2010_05969%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |