Mitigating interference of microservices with a scoring mechanism in large-scale clusters
Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production clusters. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation techno...
Gespeichert in:
Veröffentlicht in: | The Journal of supercomputing 2025, Vol.81 (1), Article 104 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 1 |
container_start_page | |
container_title | The Journal of supercomputing |
container_volume | 81 |
creator | Yang, Dingyu Zheng, Kangpeng Qian, Shiyou Hua, Qin Zhang, Kaixuan Cao, Jian Xue, Guangtao |
description | Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production clusters. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we observe that BEJs typically exhibit periodic execution patterns and serve as the primary sources of interference to LCSs. Furthermore, despite occupying the same level of resource consumption, the diverse compositions of BEJs can result in varying degrees of interference on LCSs. Subsequently, we propose PISM, a proactive Performance Interference Scoring and Mitigating framework for LCSs through the optimization of BEJ scheduling. Firstly, PISM adopts a data-driven approach to establish a characterization and classification methodology for BEJs. Secondly, PISM models the relationship between the composition of BEJs on servers and the response time (RT) of LCSs. Thirdly, PISM establishes an interference scoring mechanism in terms of RT, which serves as the foundation for BEJ scheduling. We assess the effectiveness of PISM on a small-scale cluster and through extensive data-driven simulations. The experiment results demonstrate that PISM can reduce cluster interference by up to 41.5%, and improve the throughput of long-tail LCSs by 76.4%. |
doi_str_mv | 10.1007/s11227-024-06534-7 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3122603467</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3122603467</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-f632276adff0471766228504dd642b23249401da5e4a1e7655a3dc1f47d4f92a3</originalsourceid><addsrcrecordid>eNp9kLtOAzEQRS0EEiHwA1SWqA3jx9pJiSJeUhANFFSW8Y4TR_sI9i6Iv2eXRaKjmuaeOzOHkHMOlxzAXGXOhTAMhGKgC6mYOSAzXhjJQC3UIZnBUgBbFEock5OcdwCgpJEz8voYu7hxXWw2NDYdpoAJG4-0DbSOPrUZ00f0mOln7LbU0ezbNIZr9FvXxFwPGK1c2iDL3lVIfdXnoSefkqPgqoxnv3NOXm5vnlf3bP1097C6XjMvADoWtBwu164MAZThRmshFgWostRKvAkp1FIBL12BynE0uiicLD0PypQqLIWTc3Ix9e5T-95j7uyu7VMzrLRykKJBKm2GlJhS40s5YbD7FGuXviwHOyq0k0I7KLQ_Cu0IyQnK-_FnTH_V_1Df8Q10Aw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3122603467</pqid></control><display><type>article</type><title>Mitigating interference of microservices with a scoring mechanism in large-scale clusters</title><source>SpringerLink Journals - AutoHoldings</source><creator>Yang, Dingyu ; Zheng, Kangpeng ; Qian, Shiyou ; Hua, Qin ; Zhang, Kaixuan ; Cao, Jian ; Xue, Guangtao</creator><creatorcontrib>Yang, Dingyu ; Zheng, Kangpeng ; Qian, Shiyou ; Hua, Qin ; Zhang, Kaixuan ; Cao, Jian ; Xue, Guangtao</creatorcontrib><description>Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production clusters. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we observe that BEJs typically exhibit periodic execution patterns and serve as the primary sources of interference to LCSs. Furthermore, despite occupying the same level of resource consumption, the diverse compositions of BEJs can result in varying degrees of interference on LCSs. Subsequently, we propose PISM, a proactive Performance Interference Scoring and Mitigating framework for LCSs through the optimization of BEJ scheduling. Firstly, PISM adopts a data-driven approach to establish a characterization and classification methodology for BEJs. Secondly, PISM models the relationship between the composition of BEJs on servers and the response time (RT) of LCSs. Thirdly, PISM establishes an interference scoring mechanism in terms of RT, which serves as the foundation for BEJ scheduling. We assess the effectiveness of PISM on a small-scale cluster and through extensive data-driven simulations. The experiment results demonstrate that PISM can reduce cluster interference by up to 41.5%, and improve the throughput of long-tail LCSs by 76.4%.</description><identifier>ISSN: 0920-8542</identifier><identifier>EISSN: 1573-0484</identifier><identifier>DOI: 10.1007/s11227-024-06534-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Cluster analysis ; Compilers ; Composition ; Computer Science ; Interpreters ; Performance evaluation ; Processor Architectures ; Programming Languages ; Resource utilization ; Scheduling ; Technology assessment</subject><ispartof>The Journal of supercomputing, 2025, Vol.81 (1), Article 104</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-f632276adff0471766228504dd642b23249401da5e4a1e7655a3dc1f47d4f92a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11227-024-06534-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11227-024-06534-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Yang, Dingyu</creatorcontrib><creatorcontrib>Zheng, Kangpeng</creatorcontrib><creatorcontrib>Qian, Shiyou</creatorcontrib><creatorcontrib>Hua, Qin</creatorcontrib><creatorcontrib>Zhang, Kaixuan</creatorcontrib><creatorcontrib>Cao, Jian</creatorcontrib><creatorcontrib>Xue, Guangtao</creatorcontrib><title>Mitigating interference of microservices with a scoring mechanism in large-scale clusters</title><title>The Journal of supercomputing</title><addtitle>J Supercomput</addtitle><description>Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production clusters. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we observe that BEJs typically exhibit periodic execution patterns and serve as the primary sources of interference to LCSs. Furthermore, despite occupying the same level of resource consumption, the diverse compositions of BEJs can result in varying degrees of interference on LCSs. Subsequently, we propose PISM, a proactive Performance Interference Scoring and Mitigating framework for LCSs through the optimization of BEJ scheduling. Firstly, PISM adopts a data-driven approach to establish a characterization and classification methodology for BEJs. Secondly, PISM models the relationship between the composition of BEJs on servers and the response time (RT) of LCSs. Thirdly, PISM establishes an interference scoring mechanism in terms of RT, which serves as the foundation for BEJ scheduling. We assess the effectiveness of PISM on a small-scale cluster and through extensive data-driven simulations. The experiment results demonstrate that PISM can reduce cluster interference by up to 41.5%, and improve the throughput of long-tail LCSs by 76.4%.</description><subject>Cluster analysis</subject><subject>Compilers</subject><subject>Composition</subject><subject>Computer Science</subject><subject>Interpreters</subject><subject>Performance evaluation</subject><subject>Processor Architectures</subject><subject>Programming Languages</subject><subject>Resource utilization</subject><subject>Scheduling</subject><subject>Technology assessment</subject><issn>0920-8542</issn><issn>1573-0484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNp9kLtOAzEQRS0EEiHwA1SWqA3jx9pJiSJeUhANFFSW8Y4TR_sI9i6Iv2eXRaKjmuaeOzOHkHMOlxzAXGXOhTAMhGKgC6mYOSAzXhjJQC3UIZnBUgBbFEock5OcdwCgpJEz8voYu7hxXWw2NDYdpoAJG4-0DbSOPrUZ00f0mOln7LbU0ezbNIZr9FvXxFwPGK1c2iDL3lVIfdXnoSefkqPgqoxnv3NOXm5vnlf3bP1097C6XjMvADoWtBwu164MAZThRmshFgWostRKvAkp1FIBL12BynE0uiicLD0PypQqLIWTc3Ix9e5T-95j7uyu7VMzrLRykKJBKm2GlJhS40s5YbD7FGuXviwHOyq0k0I7KLQ_Cu0IyQnK-_FnTH_V_1Df8Q10Aw</recordid><startdate>2025</startdate><enddate>2025</enddate><creator>Yang, Dingyu</creator><creator>Zheng, Kangpeng</creator><creator>Qian, Shiyou</creator><creator>Hua, Qin</creator><creator>Zhang, Kaixuan</creator><creator>Cao, Jian</creator><creator>Xue, Guangtao</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>2025</creationdate><title>Mitigating interference of microservices with a scoring mechanism in large-scale clusters</title><author>Yang, Dingyu ; Zheng, Kangpeng ; Qian, Shiyou ; Hua, Qin ; Zhang, Kaixuan ; Cao, Jian ; Xue, Guangtao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-f632276adff0471766228504dd642b23249401da5e4a1e7655a3dc1f47d4f92a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Cluster analysis</topic><topic>Compilers</topic><topic>Composition</topic><topic>Computer Science</topic><topic>Interpreters</topic><topic>Performance evaluation</topic><topic>Processor Architectures</topic><topic>Programming Languages</topic><topic>Resource utilization</topic><topic>Scheduling</topic><topic>Technology assessment</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Dingyu</creatorcontrib><creatorcontrib>Zheng, Kangpeng</creatorcontrib><creatorcontrib>Qian, Shiyou</creatorcontrib><creatorcontrib>Hua, Qin</creatorcontrib><creatorcontrib>Zhang, Kaixuan</creatorcontrib><creatorcontrib>Cao, Jian</creatorcontrib><creatorcontrib>Xue, Guangtao</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of supercomputing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Dingyu</au><au>Zheng, Kangpeng</au><au>Qian, Shiyou</au><au>Hua, Qin</au><au>Zhang, Kaixuan</au><au>Cao, Jian</au><au>Xue, Guangtao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mitigating interference of microservices with a scoring mechanism in large-scale clusters</atitle><jtitle>The Journal of supercomputing</jtitle><stitle>J Supercomput</stitle><date>2025</date><risdate>2025</risdate><volume>81</volume><issue>1</issue><artnum>104</artnum><issn>0920-8542</issn><eissn>1573-0484</eissn><abstract>Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production clusters. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we observe that BEJs typically exhibit periodic execution patterns and serve as the primary sources of interference to LCSs. Furthermore, despite occupying the same level of resource consumption, the diverse compositions of BEJs can result in varying degrees of interference on LCSs. Subsequently, we propose PISM, a proactive Performance Interference Scoring and Mitigating framework for LCSs through the optimization of BEJ scheduling. Firstly, PISM adopts a data-driven approach to establish a characterization and classification methodology for BEJs. Secondly, PISM models the relationship between the composition of BEJs on servers and the response time (RT) of LCSs. Thirdly, PISM establishes an interference scoring mechanism in terms of RT, which serves as the foundation for BEJ scheduling. We assess the effectiveness of PISM on a small-scale cluster and through extensive data-driven simulations. The experiment results demonstrate that PISM can reduce cluster interference by up to 41.5%, and improve the throughput of long-tail LCSs by 76.4%.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11227-024-06534-7</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0920-8542 |
ispartof | The Journal of supercomputing, 2025, Vol.81 (1), Article 104 |
issn | 0920-8542 1573-0484 |
language | eng |
recordid | cdi_proquest_journals_3122603467 |
source | SpringerLink Journals - AutoHoldings |
subjects | Cluster analysis Compilers Composition Computer Science Interpreters Performance evaluation Processor Architectures Programming Languages Resource utilization Scheduling Technology assessment |
title | Mitigating interference of microservices with a scoring mechanism in large-scale clusters |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T05%3A26%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mitigating%20interference%20of%20microservices%20with%20a%20scoring%20mechanism%20in%20large-scale%20clusters&rft.jtitle=The%20Journal%20of%20supercomputing&rft.au=Yang,%20Dingyu&rft.date=2025&rft.volume=81&rft.issue=1&rft.artnum=104&rft.issn=0920-8542&rft.eissn=1573-0484&rft_id=info:doi/10.1007/s11227-024-06534-7&rft_dat=%3Cproquest_cross%3E3122603467%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3122603467&rft_id=info:pmid/&rfr_iscdi=true |