Kronos: towards bus contention-aware job scheduling in warehouse scale computers
While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory band...
Gespeichert in:
Veröffentlicht in: | Frontiers of Computer Science 2023-02, Vol.17 (1), p.171101, Article 171101 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 1 |
container_start_page | 171101 |
container_title | Frontiers of Computer Science |
container_volume | 17 |
creator | XUE, Shuai ZHAO, Shang CHEN, Quan SONG, Zhuo CHEN, Shanpei MA, Tao YANG, Yong ZHENG, Wenli GUO, Minyi |
description | While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA. |
doi_str_mv | 10.1007/s11704-021-0418-5 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918721605</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918721605</sourcerecordid><originalsourceid>FETCH-LOGICAL-c393t-13ad3aea4f87689eaa6833f0a6723c86e22ac928ad07cf241751dc49f8a292113</originalsourceid><addsrcrecordid>eNp9kc1LwzAchoMoOOb-AG8Fz9HklzYf3mT4hQM96Dlkabp2bMlMWsT_3pSK3nZKeHmfN_AEoUtKrikh4iZRKkiJCVBMSipxdYJmQFSFARg__buDPEeLlLaEECBQVQAz9PYSgw_ptujDl4l1KtZDKmzwvfN9Fzw2OXXFNqyLZFtXD7vOb4rOF2PchiG5nJudy8j-MPQupgt01phdcovfc44-Hu7fl0949fr4vLxbYcsU6zFlpmbGmbKRgkvljOGSsYYYLoBZyR2AsQqkqYmwDZRUVLS2pWqkAQWUsjm6mnYPMXwOLvV6G4bo85MaFJUCKCfV0RZXSkkQnOcWnVo2hpSia_QhdnsTvzUlejSsJ8M6G9ajYT0uw8Sk3PUbF_-Xj0Fygtpu07ro6kN0Kekm_0HfZXlH0B8_vo8T</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918721605</pqid></control><display><type>article</type><title>Kronos: towards bus contention-aware job scheduling in warehouse scale computers</title><source>ProQuest Central UK/Ireland</source><source>SpringerLink Journals - AutoHoldings</source><source>ProQuest Central</source><creator>XUE, Shuai ; ZHAO, Shang ; CHEN, Quan ; SONG, Zhuo ; CHEN, Shanpei ; MA, Tao ; YANG, Yong ; ZHENG, Wenli ; GUO, Minyi</creator><creatorcontrib>XUE, Shuai ; ZHAO, Shang ; CHEN, Quan ; SONG, Zhuo ; CHEN, Shanpei ; MA, Tao ; YANG, Yong ; ZHENG, Wenli ; GUO, Minyi</creatorcontrib><description>While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.</description><identifier>ISSN: 2095-2228</identifier><identifier>EISSN: 2095-2236</identifier><identifier>DOI: 10.1007/s11704-021-0418-5</identifier><language>eng</language><publisher>Beijing: Higher Education Press</publisher><subject>Bandwidths ; bus contention ; cloud ; Computer Science ; Employment ; high performance ; Nodes ; Policies ; Polynomials ; Regression models ; Research Article ; schedule ; Scheduling ; split lock ; Tenants</subject><ispartof>Frontiers of Computer Science, 2023-02, Vol.17 (1), p.171101, Article 171101</ispartof><rights>Copyright reserved, 2021, Higher Education Press 2021</rights><rights>Higher Education Press 2023</rights><rights>Higher Education Press 2023.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c393t-13ad3aea4f87689eaa6833f0a6723c86e22ac928ad07cf241751dc49f8a292113</citedby><cites>FETCH-LOGICAL-c393t-13ad3aea4f87689eaa6833f0a6723c86e22ac928ad07cf241751dc49f8a292113</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11704-021-0418-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918721605?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,21387,27923,27924,33743,41487,42556,43804,51318,64384,64388,72240</link.rule.ids></links><search><creatorcontrib>XUE, Shuai</creatorcontrib><creatorcontrib>ZHAO, Shang</creatorcontrib><creatorcontrib>CHEN, Quan</creatorcontrib><creatorcontrib>SONG, Zhuo</creatorcontrib><creatorcontrib>CHEN, Shanpei</creatorcontrib><creatorcontrib>MA, Tao</creatorcontrib><creatorcontrib>YANG, Yong</creatorcontrib><creatorcontrib>ZHENG, Wenli</creatorcontrib><creatorcontrib>GUO, Minyi</creatorcontrib><title>Kronos: towards bus contention-aware job scheduling in warehouse scale computers</title><title>Frontiers of Computer Science</title><addtitle>Front. Comput. Sci</addtitle><description>While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.</description><subject>Bandwidths</subject><subject>bus contention</subject><subject>cloud</subject><subject>Computer Science</subject><subject>Employment</subject><subject>high performance</subject><subject>Nodes</subject><subject>Policies</subject><subject>Polynomials</subject><subject>Regression models</subject><subject>Research Article</subject><subject>schedule</subject><subject>Scheduling</subject><subject>split lock</subject><subject>Tenants</subject><issn>2095-2228</issn><issn>2095-2236</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kc1LwzAchoMoOOb-AG8Fz9HklzYf3mT4hQM96Dlkabp2bMlMWsT_3pSK3nZKeHmfN_AEoUtKrikh4iZRKkiJCVBMSipxdYJmQFSFARg__buDPEeLlLaEECBQVQAz9PYSgw_ptujDl4l1KtZDKmzwvfN9Fzw2OXXFNqyLZFtXD7vOb4rOF2PchiG5nJudy8j-MPQupgt01phdcovfc44-Hu7fl0949fr4vLxbYcsU6zFlpmbGmbKRgkvljOGSsYYYLoBZyR2AsQqkqYmwDZRUVLS2pWqkAQWUsjm6mnYPMXwOLvV6G4bo85MaFJUCKCfV0RZXSkkQnOcWnVo2hpSia_QhdnsTvzUlejSsJ8M6G9ajYT0uw8Sk3PUbF_-Xj0Fygtpu07ro6kN0Kekm_0HfZXlH0B8_vo8T</recordid><startdate>20230201</startdate><enddate>20230201</enddate><creator>XUE, Shuai</creator><creator>ZHAO, Shang</creator><creator>CHEN, Quan</creator><creator>SONG, Zhuo</creator><creator>CHEN, Shanpei</creator><creator>MA, Tao</creator><creator>YANG, Yong</creator><creator>ZHENG, Wenli</creator><creator>GUO, Minyi</creator><general>Higher Education Press</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope></search><sort><creationdate>20230201</creationdate><title>Kronos: towards bus contention-aware job scheduling in warehouse scale computers</title><author>XUE, Shuai ; ZHAO, Shang ; CHEN, Quan ; SONG, Zhuo ; CHEN, Shanpei ; MA, Tao ; YANG, Yong ; ZHENG, Wenli ; GUO, Minyi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c393t-13ad3aea4f87689eaa6833f0a6723c86e22ac928ad07cf241751dc49f8a292113</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Bandwidths</topic><topic>bus contention</topic><topic>cloud</topic><topic>Computer Science</topic><topic>Employment</topic><topic>high performance</topic><topic>Nodes</topic><topic>Policies</topic><topic>Polynomials</topic><topic>Regression models</topic><topic>Research Article</topic><topic>schedule</topic><topic>Scheduling</topic><topic>split lock</topic><topic>Tenants</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>XUE, Shuai</creatorcontrib><creatorcontrib>ZHAO, Shang</creatorcontrib><creatorcontrib>CHEN, Quan</creatorcontrib><creatorcontrib>SONG, Zhuo</creatorcontrib><creatorcontrib>CHEN, Shanpei</creatorcontrib><creatorcontrib>MA, Tao</creatorcontrib><creatorcontrib>YANG, Yong</creatorcontrib><creatorcontrib>ZHENG, Wenli</creatorcontrib><creatorcontrib>GUO, Minyi</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Frontiers of Computer Science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>XUE, Shuai</au><au>ZHAO, Shang</au><au>CHEN, Quan</au><au>SONG, Zhuo</au><au>CHEN, Shanpei</au><au>MA, Tao</au><au>YANG, Yong</au><au>ZHENG, Wenli</au><au>GUO, Minyi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Kronos: towards bus contention-aware job scheduling in warehouse scale computers</atitle><jtitle>Frontiers of Computer Science</jtitle><stitle>Front. Comput. Sci</stitle><date>2023-02-01</date><risdate>2023</risdate><volume>17</volume><issue>1</issue><spage>171101</spage><pages>171101-</pages><artnum>171101</artnum><issn>2095-2228</issn><eissn>2095-2236</eissn><abstract>While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.</abstract><cop>Beijing</cop><pub>Higher Education Press</pub><doi>10.1007/s11704-021-0418-5</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2095-2228 |
ispartof | Frontiers of Computer Science, 2023-02, Vol.17 (1), p.171101, Article 171101 |
issn | 2095-2228 2095-2236 |
language | eng |
recordid | cdi_proquest_journals_2918721605 |
source | ProQuest Central UK/Ireland; SpringerLink Journals - AutoHoldings; ProQuest Central |
subjects | Bandwidths bus contention cloud Computer Science Employment high performance Nodes Policies Polynomials Regression models Research Article schedule Scheduling split lock Tenants |
title | Kronos: towards bus contention-aware job scheduling in warehouse scale computers |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T22%3A58%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Kronos:%20towards%20bus%20contention-aware%20job%20scheduling%20in%20warehouse%20scale%20computers&rft.jtitle=Frontiers%20of%20Computer%20Science&rft.au=XUE,%20Shuai&rft.date=2023-02-01&rft.volume=17&rft.issue=1&rft.spage=171101&rft.pages=171101-&rft.artnum=171101&rft.issn=2095-2228&rft.eissn=2095-2236&rft_id=info:doi/10.1007/s11704-021-0418-5&rft_dat=%3Cproquest_cross%3E2918721605%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918721605&rft_id=info:pmid/&rfr_iscdi=true |