Component-distinguishable Co-location and Resource Reclamation for High-throughput Computing
Cloud service providers improve resource utilization by co-locating latency-critical (LC) workloads with best-effort batch (BE) jobs in datacenters. However, they usually treat multi-component LCs as monolithic applications and treat BEs as “second-class citizens” when allocating resources to them....
Gespeichert in:
Veröffentlicht in: | ACM transactions on computer systems 2024-05, Vol.42 (1-2), p.1-37, Article 2 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 37 |
---|---|
container_issue | 1-2 |
container_start_page | 1 |
container_title | ACM transactions on computer systems |
container_volume | 42 |
creator | Zhao, Laiping Cui, Yushuai Yang, Yanan Zhou, Xiaobo Qiu, Tie Li, Keqiu Bao, Yungang |
description | Cloud service providers improve resource utilization by co-locating latency-critical (LC) workloads with best-effort batch (BE) jobs in datacenters. However, they usually treat multi-component LCs as monolithic applications and treat BEs as “second-class citizens” when allocating resources to them. Neglecting the inconsistent interference tolerance abilities of LC components and the inconsistent preemption loss of BE workloads can result in missed co-location opportunities for higher throughput.We present Rhythm, a co-location controller that deploys workloads and reclaims resources rhythmically for maximizing the system throughput while guaranteeing LC service’s tail latency requirement. The key idea is to differentiate the BE throughput launched with each LC component, that is, components with higher interference tolerance can be deployed together with more BE jobs. It also assigns different reclamation priority values to BEs by evaluating their preemption losses into a multi-level reclamation queue. We implement and evaluate Rhythm using workloads in the form of containerized processes and microservices. Experimental results show that it can improve the system throughput by 47.3%, CPU utilization by 38.6%, and memory bandwidth utilization by 45.4% while guaranteeing the tail latency requirement. |
doi_str_mv | 10.1145/3630006 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3058280272</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3058280272</sourcerecordid><originalsourceid>FETCH-LOGICAL-a267t-42836cda6f1a2b2b72586d34594c0aee09cda7d67eeffffe5344108a607a7af73</originalsourceid><addsrcrecordid>eNo9kM1LxDAQxYMouK7i3VPBg6dovtM9SlFXWBBEb0KZTdNtl7apSXrwvzdLV-cyw7wfb4aH0DUl95QK-cAVJ4SoE7SgUmqsOeenaEE0F5gRTc_RRQj7RKQ9W6CvwvWjG-wQcdWG2A67qQ0NbDubFQ53zkBs3ZDBUGXvNrjJG5sG00E_C7Xz2brdNTg23k27ZpxidrCcDlaX6KyGLtirY1-iz-enj2KNN28vr8XjBgNTOmLBcq5MBaqmwLZsq5nMVcWFXAlDwFqySqKulLa2TmUlF4KSHBTRoKHWfIluZ9_Ru-_Jhlju06dDOllyInOWE6ZZou5myngXgrd1Ofq2B_9TUlIeoiuP0SXyZibB9P_Qn_gLGcFplA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3058280272</pqid></control><display><type>article</type><title>Component-distinguishable Co-location and Resource Reclamation for High-throughput Computing</title><source>ACM Digital Library</source><source>Business Source Complete</source><creator>Zhao, Laiping ; Cui, Yushuai ; Yang, Yanan ; Zhou, Xiaobo ; Qiu, Tie ; Li, Keqiu ; Bao, Yungang</creator><creatorcontrib>Zhao, Laiping ; Cui, Yushuai ; Yang, Yanan ; Zhou, Xiaobo ; Qiu, Tie ; Li, Keqiu ; Bao, Yungang</creatorcontrib><description>Cloud service providers improve resource utilization by co-locating latency-critical (LC) workloads with best-effort batch (BE) jobs in datacenters. However, they usually treat multi-component LCs as monolithic applications and treat BEs as “second-class citizens” when allocating resources to them. Neglecting the inconsistent interference tolerance abilities of LC components and the inconsistent preemption loss of BE workloads can result in missed co-location opportunities for higher throughput.We present Rhythm, a co-location controller that deploys workloads and reclaims resources rhythmically for maximizing the system throughput while guaranteeing LC service’s tail latency requirement. The key idea is to differentiate the BE throughput launched with each LC component, that is, components with higher interference tolerance can be deployed together with more BE jobs. It also assigns different reclamation priority values to BEs by evaluating their preemption losses into a multi-level reclamation queue. We implement and evaluate Rhythm using workloads in the form of containerized processes and microservices. Experimental results show that it can improve the system throughput by 47.3%, CPU utilization by 38.6%, and memory bandwidth utilization by 45.4% while guaranteeing the tail latency requirement.</description><identifier>ISSN: 0734-2071</identifier><identifier>EISSN: 1557-7333</identifier><identifier>DOI: 10.1145/3630006</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Cloud computing ; Computer systems organization ; Interference ; Reclamation ; Resource utilization ; Rhythm ; Workload ; Workloads</subject><ispartof>ACM transactions on computer systems, 2024-05, Vol.42 (1-2), p.1-37, Article 2</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><rights>Copyright Association for Computing Machinery May 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a267t-42836cda6f1a2b2b72586d34594c0aee09cda7d67eeffffe5344108a607a7af73</cites><orcidid>0000-0003-1967-2192 ; 0000-0002-7772-458X ; 0000-0003-2324-2523 ; 0009-0009-9413-5187 ; 0000-0002-2222-393X ; 0000-0003-1758-3030 ; 0000-0001-6565-5276</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3630006$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2282,27924,27925,40196,76228</link.rule.ids></links><search><creatorcontrib>Zhao, Laiping</creatorcontrib><creatorcontrib>Cui, Yushuai</creatorcontrib><creatorcontrib>Yang, Yanan</creatorcontrib><creatorcontrib>Zhou, Xiaobo</creatorcontrib><creatorcontrib>Qiu, Tie</creatorcontrib><creatorcontrib>Li, Keqiu</creatorcontrib><creatorcontrib>Bao, Yungang</creatorcontrib><title>Component-distinguishable Co-location and Resource Reclamation for High-throughput Computing</title><title>ACM transactions on computer systems</title><addtitle>ACM TOCS</addtitle><description>Cloud service providers improve resource utilization by co-locating latency-critical (LC) workloads with best-effort batch (BE) jobs in datacenters. However, they usually treat multi-component LCs as monolithic applications and treat BEs as “second-class citizens” when allocating resources to them. Neglecting the inconsistent interference tolerance abilities of LC components and the inconsistent preemption loss of BE workloads can result in missed co-location opportunities for higher throughput.We present Rhythm, a co-location controller that deploys workloads and reclaims resources rhythmically for maximizing the system throughput while guaranteeing LC service’s tail latency requirement. The key idea is to differentiate the BE throughput launched with each LC component, that is, components with higher interference tolerance can be deployed together with more BE jobs. It also assigns different reclamation priority values to BEs by evaluating their preemption losses into a multi-level reclamation queue. We implement and evaluate Rhythm using workloads in the form of containerized processes and microservices. Experimental results show that it can improve the system throughput by 47.3%, CPU utilization by 38.6%, and memory bandwidth utilization by 45.4% while guaranteeing the tail latency requirement.</description><subject>Cloud computing</subject><subject>Computer systems organization</subject><subject>Interference</subject><subject>Reclamation</subject><subject>Resource utilization</subject><subject>Rhythm</subject><subject>Workload</subject><subject>Workloads</subject><issn>0734-2071</issn><issn>1557-7333</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9kM1LxDAQxYMouK7i3VPBg6dovtM9SlFXWBBEb0KZTdNtl7apSXrwvzdLV-cyw7wfb4aH0DUl95QK-cAVJ4SoE7SgUmqsOeenaEE0F5gRTc_RRQj7RKQ9W6CvwvWjG-wQcdWG2A67qQ0NbDubFQ53zkBs3ZDBUGXvNrjJG5sG00E_C7Xz2brdNTg23k27ZpxidrCcDlaX6KyGLtirY1-iz-enj2KNN28vr8XjBgNTOmLBcq5MBaqmwLZsq5nMVcWFXAlDwFqySqKulLa2TmUlF4KSHBTRoKHWfIluZ9_Ru-_Jhlju06dDOllyInOWE6ZZou5myngXgrd1Ofq2B_9TUlIeoiuP0SXyZibB9P_Qn_gLGcFplA</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Zhao, Laiping</creator><creator>Cui, Yushuai</creator><creator>Yang, Yanan</creator><creator>Zhou, Xiaobo</creator><creator>Qiu, Tie</creator><creator>Li, Keqiu</creator><creator>Bao, Yungang</creator><general>ACM</general><general>Association for Computing Machinery</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1967-2192</orcidid><orcidid>https://orcid.org/0000-0002-7772-458X</orcidid><orcidid>https://orcid.org/0000-0003-2324-2523</orcidid><orcidid>https://orcid.org/0009-0009-9413-5187</orcidid><orcidid>https://orcid.org/0000-0002-2222-393X</orcidid><orcidid>https://orcid.org/0000-0003-1758-3030</orcidid><orcidid>https://orcid.org/0000-0001-6565-5276</orcidid></search><sort><creationdate>20240501</creationdate><title>Component-distinguishable Co-location and Resource Reclamation for High-throughput Computing</title><author>Zhao, Laiping ; Cui, Yushuai ; Yang, Yanan ; Zhou, Xiaobo ; Qiu, Tie ; Li, Keqiu ; Bao, Yungang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a267t-42836cda6f1a2b2b72586d34594c0aee09cda7d67eeffffe5344108a607a7af73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Cloud computing</topic><topic>Computer systems organization</topic><topic>Interference</topic><topic>Reclamation</topic><topic>Resource utilization</topic><topic>Rhythm</topic><topic>Workload</topic><topic>Workloads</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Laiping</creatorcontrib><creatorcontrib>Cui, Yushuai</creatorcontrib><creatorcontrib>Yang, Yanan</creatorcontrib><creatorcontrib>Zhou, Xiaobo</creatorcontrib><creatorcontrib>Qiu, Tie</creatorcontrib><creatorcontrib>Li, Keqiu</creatorcontrib><creatorcontrib>Bao, Yungang</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM transactions on computer systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhao, Laiping</au><au>Cui, Yushuai</au><au>Yang, Yanan</au><au>Zhou, Xiaobo</au><au>Qiu, Tie</au><au>Li, Keqiu</au><au>Bao, Yungang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Component-distinguishable Co-location and Resource Reclamation for High-throughput Computing</atitle><jtitle>ACM transactions on computer systems</jtitle><stitle>ACM TOCS</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>42</volume><issue>1-2</issue><spage>1</spage><epage>37</epage><pages>1-37</pages><artnum>2</artnum><issn>0734-2071</issn><eissn>1557-7333</eissn><abstract>Cloud service providers improve resource utilization by co-locating latency-critical (LC) workloads with best-effort batch (BE) jobs in datacenters. However, they usually treat multi-component LCs as monolithic applications and treat BEs as “second-class citizens” when allocating resources to them. Neglecting the inconsistent interference tolerance abilities of LC components and the inconsistent preemption loss of BE workloads can result in missed co-location opportunities for higher throughput.We present Rhythm, a co-location controller that deploys workloads and reclaims resources rhythmically for maximizing the system throughput while guaranteeing LC service’s tail latency requirement. The key idea is to differentiate the BE throughput launched with each LC component, that is, components with higher interference tolerance can be deployed together with more BE jobs. It also assigns different reclamation priority values to BEs by evaluating their preemption losses into a multi-level reclamation queue. We implement and evaluate Rhythm using workloads in the form of containerized processes and microservices. Experimental results show that it can improve the system throughput by 47.3%, CPU utilization by 38.6%, and memory bandwidth utilization by 45.4% while guaranteeing the tail latency requirement.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3630006</doi><tpages>37</tpages><orcidid>https://orcid.org/0000-0003-1967-2192</orcidid><orcidid>https://orcid.org/0000-0002-7772-458X</orcidid><orcidid>https://orcid.org/0000-0003-2324-2523</orcidid><orcidid>https://orcid.org/0009-0009-9413-5187</orcidid><orcidid>https://orcid.org/0000-0002-2222-393X</orcidid><orcidid>https://orcid.org/0000-0003-1758-3030</orcidid><orcidid>https://orcid.org/0000-0001-6565-5276</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0734-2071 |
ispartof | ACM transactions on computer systems, 2024-05, Vol.42 (1-2), p.1-37, Article 2 |
issn | 0734-2071 1557-7333 |
language | eng |
recordid | cdi_proquest_journals_3058280272 |
source | ACM Digital Library; Business Source Complete |
subjects | Cloud computing Computer systems organization Interference Reclamation Resource utilization Rhythm Workload Workloads |
title | Component-distinguishable Co-location and Resource Reclamation for High-throughput Computing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T06%3A54%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Component-distinguishable%20Co-location%20and%20Resource%20Reclamation%20for%20High-throughput%20Computing&rft.jtitle=ACM%20transactions%20on%20computer%20systems&rft.au=Zhao,%20Laiping&rft.date=2024-05-01&rft.volume=42&rft.issue=1-2&rft.spage=1&rft.epage=37&rft.pages=1-37&rft.artnum=2&rft.issn=0734-2071&rft.eissn=1557-7333&rft_id=info:doi/10.1145/3630006&rft_dat=%3Cproquest_cross%3E3058280272%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3058280272&rft_id=info:pmid/&rfr_iscdi=true |