Kronos: towards bus contention-aware job scheduling in warehouse scale computers

While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory band...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Frontiers of Computer Science 2023-02, Vol.17 (1), p.171101, Article 171101
Hauptverfasser: XUE, Shuai, ZHAO, Shang, CHEN, Quan, SONG, Zhuo, CHEN, Shanpei, MA, Tao, YANG, Yong, ZHENG, Wenli, GUO, Minyi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 1
container_start_page 171101
container_title Frontiers of Computer Science
container_volume 17
creator XUE, Shuai
ZHAO, Shang
CHEN, Quan
SONG, Zhuo
CHEN, Shanpei
MA, Tao
YANG, Yong
ZHENG, Wenli
GUO, Minyi
description While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.
doi_str_mv 10.1007/s11704-021-0418-5
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918721605</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918721605</sourcerecordid><originalsourceid>FETCH-LOGICAL-c393t-13ad3aea4f87689eaa6833f0a6723c86e22ac928ad07cf241751dc49f8a292113</originalsourceid><addsrcrecordid>eNp9kc1LwzAchoMoOOb-AG8Fz9HklzYf3mT4hQM96Dlkabp2bMlMWsT_3pSK3nZKeHmfN_AEoUtKrikh4iZRKkiJCVBMSipxdYJmQFSFARg__buDPEeLlLaEECBQVQAz9PYSgw_ptujDl4l1KtZDKmzwvfN9Fzw2OXXFNqyLZFtXD7vOb4rOF2PchiG5nJudy8j-MPQupgt01phdcovfc44-Hu7fl0949fr4vLxbYcsU6zFlpmbGmbKRgkvljOGSsYYYLoBZyR2AsQqkqYmwDZRUVLS2pWqkAQWUsjm6mnYPMXwOLvV6G4bo85MaFJUCKCfV0RZXSkkQnOcWnVo2hpSia_QhdnsTvzUlejSsJ8M6G9ajYT0uw8Sk3PUbF_-Xj0Fygtpu07ro6kN0Kekm_0HfZXlH0B8_vo8T</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918721605</pqid></control><display><type>article</type><title>Kronos: towards bus contention-aware job scheduling in warehouse scale computers</title><source>ProQuest Central UK/Ireland</source><source>SpringerLink Journals - AutoHoldings</source><source>ProQuest Central</source><creator>XUE, Shuai ; ZHAO, Shang ; CHEN, Quan ; SONG, Zhuo ; CHEN, Shanpei ; MA, Tao ; YANG, Yong ; ZHENG, Wenli ; GUO, Minyi</creator><creatorcontrib>XUE, Shuai ; ZHAO, Shang ; CHEN, Quan ; SONG, Zhuo ; CHEN, Shanpei ; MA, Tao ; YANG, Yong ; ZHENG, Wenli ; GUO, Minyi</creatorcontrib><description>While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.</description><identifier>ISSN: 2095-2228</identifier><identifier>EISSN: 2095-2236</identifier><identifier>DOI: 10.1007/s11704-021-0418-5</identifier><language>eng</language><publisher>Beijing: Higher Education Press</publisher><subject>Bandwidths ; bus contention ; cloud ; Computer Science ; Employment ; high performance ; Nodes ; Policies ; Polynomials ; Regression models ; Research Article ; schedule ; Scheduling ; split lock ; Tenants</subject><ispartof>Frontiers of Computer Science, 2023-02, Vol.17 (1), p.171101, Article 171101</ispartof><rights>Copyright reserved, 2021, Higher Education Press 2021</rights><rights>Higher Education Press 2023</rights><rights>Higher Education Press 2023.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c393t-13ad3aea4f87689eaa6833f0a6723c86e22ac928ad07cf241751dc49f8a292113</citedby><cites>FETCH-LOGICAL-c393t-13ad3aea4f87689eaa6833f0a6723c86e22ac928ad07cf241751dc49f8a292113</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11704-021-0418-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918721605?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,21387,27923,27924,33743,41487,42556,43804,51318,64384,64388,72240</link.rule.ids></links><search><creatorcontrib>XUE, Shuai</creatorcontrib><creatorcontrib>ZHAO, Shang</creatorcontrib><creatorcontrib>CHEN, Quan</creatorcontrib><creatorcontrib>SONG, Zhuo</creatorcontrib><creatorcontrib>CHEN, Shanpei</creatorcontrib><creatorcontrib>MA, Tao</creatorcontrib><creatorcontrib>YANG, Yong</creatorcontrib><creatorcontrib>ZHENG, Wenli</creatorcontrib><creatorcontrib>GUO, Minyi</creatorcontrib><title>Kronos: towards bus contention-aware job scheduling in warehouse scale computers</title><title>Frontiers of Computer Science</title><addtitle>Front. Comput. Sci</addtitle><description>While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.</description><subject>Bandwidths</subject><subject>bus contention</subject><subject>cloud</subject><subject>Computer Science</subject><subject>Employment</subject><subject>high performance</subject><subject>Nodes</subject><subject>Policies</subject><subject>Polynomials</subject><subject>Regression models</subject><subject>Research Article</subject><subject>schedule</subject><subject>Scheduling</subject><subject>split lock</subject><subject>Tenants</subject><issn>2095-2228</issn><issn>2095-2236</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kc1LwzAchoMoOOb-AG8Fz9HklzYf3mT4hQM96Dlkabp2bMlMWsT_3pSK3nZKeHmfN_AEoUtKrikh4iZRKkiJCVBMSipxdYJmQFSFARg__buDPEeLlLaEECBQVQAz9PYSgw_ptujDl4l1KtZDKmzwvfN9Fzw2OXXFNqyLZFtXD7vOb4rOF2PchiG5nJudy8j-MPQupgt01phdcovfc44-Hu7fl0949fr4vLxbYcsU6zFlpmbGmbKRgkvljOGSsYYYLoBZyR2AsQqkqYmwDZRUVLS2pWqkAQWUsjm6mnYPMXwOLvV6G4bo85MaFJUCKCfV0RZXSkkQnOcWnVo2hpSia_QhdnsTvzUlejSsJ8M6G9ajYT0uw8Sk3PUbF_-Xj0Fygtpu07ro6kN0Kekm_0HfZXlH0B8_vo8T</recordid><startdate>20230201</startdate><enddate>20230201</enddate><creator>XUE, Shuai</creator><creator>ZHAO, Shang</creator><creator>CHEN, Quan</creator><creator>SONG, Zhuo</creator><creator>CHEN, Shanpei</creator><creator>MA, Tao</creator><creator>YANG, Yong</creator><creator>ZHENG, Wenli</creator><creator>GUO, Minyi</creator><general>Higher Education Press</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope></search><sort><creationdate>20230201</creationdate><title>Kronos: towards bus contention-aware job scheduling in warehouse scale computers</title><author>XUE, Shuai ; ZHAO, Shang ; CHEN, Quan ; SONG, Zhuo ; CHEN, Shanpei ; MA, Tao ; YANG, Yong ; ZHENG, Wenli ; GUO, Minyi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c393t-13ad3aea4f87689eaa6833f0a6723c86e22ac928ad07cf241751dc49f8a292113</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Bandwidths</topic><topic>bus contention</topic><topic>cloud</topic><topic>Computer Science</topic><topic>Employment</topic><topic>high performance</topic><topic>Nodes</topic><topic>Policies</topic><topic>Polynomials</topic><topic>Regression models</topic><topic>Research Article</topic><topic>schedule</topic><topic>Scheduling</topic><topic>split lock</topic><topic>Tenants</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>XUE, Shuai</creatorcontrib><creatorcontrib>ZHAO, Shang</creatorcontrib><creatorcontrib>CHEN, Quan</creatorcontrib><creatorcontrib>SONG, Zhuo</creatorcontrib><creatorcontrib>CHEN, Shanpei</creatorcontrib><creatorcontrib>MA, Tao</creatorcontrib><creatorcontrib>YANG, Yong</creatorcontrib><creatorcontrib>ZHENG, Wenli</creatorcontrib><creatorcontrib>GUO, Minyi</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Frontiers of Computer Science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>XUE, Shuai</au><au>ZHAO, Shang</au><au>CHEN, Quan</au><au>SONG, Zhuo</au><au>CHEN, Shanpei</au><au>MA, Tao</au><au>YANG, Yong</au><au>ZHENG, Wenli</au><au>GUO, Minyi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Kronos: towards bus contention-aware job scheduling in warehouse scale computers</atitle><jtitle>Frontiers of Computer Science</jtitle><stitle>Front. Comput. Sci</stitle><date>2023-02-01</date><risdate>2023</risdate><volume>17</volume><issue>1</issue><spage>171101</spage><pages>171101-</pages><artnum>171101</artnum><issn>2095-2228</issn><eissn>2095-2236</eissn><abstract>While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.</abstract><cop>Beijing</cop><pub>Higher Education Press</pub><doi>10.1007/s11704-021-0418-5</doi></addata></record>
fulltext fulltext
identifier ISSN: 2095-2228
ispartof Frontiers of Computer Science, 2023-02, Vol.17 (1), p.171101, Article 171101
issn 2095-2228
2095-2236
language eng
recordid cdi_proquest_journals_2918721605
source ProQuest Central UK/Ireland; SpringerLink Journals - AutoHoldings; ProQuest Central
subjects Bandwidths
bus contention
cloud
Computer Science
Employment
high performance
Nodes
Policies
Polynomials
Regression models
Research Article
schedule
Scheduling
split lock
Tenants
title Kronos: towards bus contention-aware job scheduling in warehouse scale computers
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T22%3A58%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Kronos:%20towards%20bus%20contention-aware%20job%20scheduling%20in%20warehouse%20scale%20computers&rft.jtitle=Frontiers%20of%20Computer%20Science&rft.au=XUE,%20Shuai&rft.date=2023-02-01&rft.volume=17&rft.issue=1&rft.spage=171101&rft.pages=171101-&rft.artnum=171101&rft.issn=2095-2228&rft.eissn=2095-2236&rft_id=info:doi/10.1007/s11704-021-0418-5&rft_dat=%3Cproquest_cross%3E2918721605%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918721605&rft_id=info:pmid/&rfr_iscdi=true