Mitigating interference of microservices with a scoring mechanism in large-scale clusters

Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production clusters. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation techno...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of supercomputing 2025, Vol.81 (1), Article 104
Hauptverfasser: Yang, Dingyu, Zheng, Kangpeng, Qian, Shiyou, Hua, Qin, Zhang, Kaixuan, Cao, Jian, Xue, Guangtao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 1
container_start_page
container_title The Journal of supercomputing
container_volume 81
creator Yang, Dingyu
Zheng, Kangpeng
Qian, Shiyou
Hua, Qin
Zhang, Kaixuan
Cao, Jian
Xue, Guangtao
description Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production clusters. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we observe that BEJs typically exhibit periodic execution patterns and serve as the primary sources of interference to LCSs. Furthermore, despite occupying the same level of resource consumption, the diverse compositions of BEJs can result in varying degrees of interference on LCSs. Subsequently, we propose PISM, a proactive Performance Interference Scoring and Mitigating framework for LCSs through the optimization of BEJ scheduling. Firstly, PISM adopts a data-driven approach to establish a characterization and classification methodology for BEJs. Secondly, PISM models the relationship between the composition of BEJs on servers and the response time (RT) of LCSs. Thirdly, PISM establishes an interference scoring mechanism in terms of RT, which serves as the foundation for BEJ scheduling. We assess the effectiveness of PISM on a small-scale cluster and through extensive data-driven simulations. The experiment results demonstrate that PISM can reduce cluster interference by up to 41.5%, and improve the throughput of long-tail LCSs by 76.4%.
doi_str_mv 10.1007/s11227-024-06534-7
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3122603467</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3122603467</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-f632276adff0471766228504dd642b23249401da5e4a1e7655a3dc1f47d4f92a3</originalsourceid><addsrcrecordid>eNp9kLtOAzEQRS0EEiHwA1SWqA3jx9pJiSJeUhANFFSW8Y4TR_sI9i6Iv2eXRaKjmuaeOzOHkHMOlxzAXGXOhTAMhGKgC6mYOSAzXhjJQC3UIZnBUgBbFEock5OcdwCgpJEz8voYu7hxXWw2NDYdpoAJG4-0DbSOPrUZ00f0mOln7LbU0ezbNIZr9FvXxFwPGK1c2iDL3lVIfdXnoSefkqPgqoxnv3NOXm5vnlf3bP1097C6XjMvADoWtBwu164MAZThRmshFgWostRKvAkp1FIBL12BynE0uiicLD0PypQqLIWTc3Ix9e5T-95j7uyu7VMzrLRykKJBKm2GlJhS40s5YbD7FGuXviwHOyq0k0I7KLQ_Cu0IyQnK-_FnTH_V_1Df8Q10Aw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3122603467</pqid></control><display><type>article</type><title>Mitigating interference of microservices with a scoring mechanism in large-scale clusters</title><source>SpringerLink Journals - AutoHoldings</source><creator>Yang, Dingyu ; Zheng, Kangpeng ; Qian, Shiyou ; Hua, Qin ; Zhang, Kaixuan ; Cao, Jian ; Xue, Guangtao</creator><creatorcontrib>Yang, Dingyu ; Zheng, Kangpeng ; Qian, Shiyou ; Hua, Qin ; Zhang, Kaixuan ; Cao, Jian ; Xue, Guangtao</creatorcontrib><description>Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production clusters. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we observe that BEJs typically exhibit periodic execution patterns and serve as the primary sources of interference to LCSs. Furthermore, despite occupying the same level of resource consumption, the diverse compositions of BEJs can result in varying degrees of interference on LCSs. Subsequently, we propose PISM, a proactive Performance Interference Scoring and Mitigating framework for LCSs through the optimization of BEJ scheduling. Firstly, PISM adopts a data-driven approach to establish a characterization and classification methodology for BEJs. Secondly, PISM models the relationship between the composition of BEJs on servers and the response time (RT) of LCSs. Thirdly, PISM establishes an interference scoring mechanism in terms of RT, which serves as the foundation for BEJ scheduling. We assess the effectiveness of PISM on a small-scale cluster and through extensive data-driven simulations. The experiment results demonstrate that PISM can reduce cluster interference by up to 41.5%, and improve the throughput of long-tail LCSs by 76.4%.</description><identifier>ISSN: 0920-8542</identifier><identifier>EISSN: 1573-0484</identifier><identifier>DOI: 10.1007/s11227-024-06534-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Cluster analysis ; Compilers ; Composition ; Computer Science ; Interpreters ; Performance evaluation ; Processor Architectures ; Programming Languages ; Resource utilization ; Scheduling ; Technology assessment</subject><ispartof>The Journal of supercomputing, 2025, Vol.81 (1), Article 104</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-f632276adff0471766228504dd642b23249401da5e4a1e7655a3dc1f47d4f92a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11227-024-06534-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11227-024-06534-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Yang, Dingyu</creatorcontrib><creatorcontrib>Zheng, Kangpeng</creatorcontrib><creatorcontrib>Qian, Shiyou</creatorcontrib><creatorcontrib>Hua, Qin</creatorcontrib><creatorcontrib>Zhang, Kaixuan</creatorcontrib><creatorcontrib>Cao, Jian</creatorcontrib><creatorcontrib>Xue, Guangtao</creatorcontrib><title>Mitigating interference of microservices with a scoring mechanism in large-scale clusters</title><title>The Journal of supercomputing</title><addtitle>J Supercomput</addtitle><description>Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production clusters. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we observe that BEJs typically exhibit periodic execution patterns and serve as the primary sources of interference to LCSs. Furthermore, despite occupying the same level of resource consumption, the diverse compositions of BEJs can result in varying degrees of interference on LCSs. Subsequently, we propose PISM, a proactive Performance Interference Scoring and Mitigating framework for LCSs through the optimization of BEJ scheduling. Firstly, PISM adopts a data-driven approach to establish a characterization and classification methodology for BEJs. Secondly, PISM models the relationship between the composition of BEJs on servers and the response time (RT) of LCSs. Thirdly, PISM establishes an interference scoring mechanism in terms of RT, which serves as the foundation for BEJ scheduling. We assess the effectiveness of PISM on a small-scale cluster and through extensive data-driven simulations. The experiment results demonstrate that PISM can reduce cluster interference by up to 41.5%, and improve the throughput of long-tail LCSs by 76.4%.</description><subject>Cluster analysis</subject><subject>Compilers</subject><subject>Composition</subject><subject>Computer Science</subject><subject>Interpreters</subject><subject>Performance evaluation</subject><subject>Processor Architectures</subject><subject>Programming Languages</subject><subject>Resource utilization</subject><subject>Scheduling</subject><subject>Technology assessment</subject><issn>0920-8542</issn><issn>1573-0484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNp9kLtOAzEQRS0EEiHwA1SWqA3jx9pJiSJeUhANFFSW8Y4TR_sI9i6Iv2eXRaKjmuaeOzOHkHMOlxzAXGXOhTAMhGKgC6mYOSAzXhjJQC3UIZnBUgBbFEock5OcdwCgpJEz8voYu7hxXWw2NDYdpoAJG4-0DbSOPrUZ00f0mOln7LbU0ezbNIZr9FvXxFwPGK1c2iDL3lVIfdXnoSefkqPgqoxnv3NOXm5vnlf3bP1097C6XjMvADoWtBwu164MAZThRmshFgWostRKvAkp1FIBL12BynE0uiicLD0PypQqLIWTc3Ix9e5T-95j7uyu7VMzrLRykKJBKm2GlJhS40s5YbD7FGuXviwHOyq0k0I7KLQ_Cu0IyQnK-_FnTH_V_1Df8Q10Aw</recordid><startdate>2025</startdate><enddate>2025</enddate><creator>Yang, Dingyu</creator><creator>Zheng, Kangpeng</creator><creator>Qian, Shiyou</creator><creator>Hua, Qin</creator><creator>Zhang, Kaixuan</creator><creator>Cao, Jian</creator><creator>Xue, Guangtao</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>2025</creationdate><title>Mitigating interference of microservices with a scoring mechanism in large-scale clusters</title><author>Yang, Dingyu ; Zheng, Kangpeng ; Qian, Shiyou ; Hua, Qin ; Zhang, Kaixuan ; Cao, Jian ; Xue, Guangtao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-f632276adff0471766228504dd642b23249401da5e4a1e7655a3dc1f47d4f92a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Cluster analysis</topic><topic>Compilers</topic><topic>Composition</topic><topic>Computer Science</topic><topic>Interpreters</topic><topic>Performance evaluation</topic><topic>Processor Architectures</topic><topic>Programming Languages</topic><topic>Resource utilization</topic><topic>Scheduling</topic><topic>Technology assessment</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Dingyu</creatorcontrib><creatorcontrib>Zheng, Kangpeng</creatorcontrib><creatorcontrib>Qian, Shiyou</creatorcontrib><creatorcontrib>Hua, Qin</creatorcontrib><creatorcontrib>Zhang, Kaixuan</creatorcontrib><creatorcontrib>Cao, Jian</creatorcontrib><creatorcontrib>Xue, Guangtao</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of supercomputing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Dingyu</au><au>Zheng, Kangpeng</au><au>Qian, Shiyou</au><au>Hua, Qin</au><au>Zhang, Kaixuan</au><au>Cao, Jian</au><au>Xue, Guangtao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mitigating interference of microservices with a scoring mechanism in large-scale clusters</atitle><jtitle>The Journal of supercomputing</jtitle><stitle>J Supercomput</stitle><date>2025</date><risdate>2025</risdate><volume>81</volume><issue>1</issue><artnum>104</artnum><issn>0920-8542</issn><eissn>1573-0484</eissn><abstract>Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production clusters. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we observe that BEJs typically exhibit periodic execution patterns and serve as the primary sources of interference to LCSs. Furthermore, despite occupying the same level of resource consumption, the diverse compositions of BEJs can result in varying degrees of interference on LCSs. Subsequently, we propose PISM, a proactive Performance Interference Scoring and Mitigating framework for LCSs through the optimization of BEJ scheduling. Firstly, PISM adopts a data-driven approach to establish a characterization and classification methodology for BEJs. Secondly, PISM models the relationship between the composition of BEJs on servers and the response time (RT) of LCSs. Thirdly, PISM establishes an interference scoring mechanism in terms of RT, which serves as the foundation for BEJ scheduling. We assess the effectiveness of PISM on a small-scale cluster and through extensive data-driven simulations. The experiment results demonstrate that PISM can reduce cluster interference by up to 41.5%, and improve the throughput of long-tail LCSs by 76.4%.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11227-024-06534-7</doi></addata></record>
fulltext fulltext
identifier ISSN: 0920-8542
ispartof The Journal of supercomputing, 2025, Vol.81 (1), Article 104
issn 0920-8542
1573-0484
language eng
recordid cdi_proquest_journals_3122603467
source SpringerLink Journals - AutoHoldings
subjects Cluster analysis
Compilers
Composition
Computer Science
Interpreters
Performance evaluation
Processor Architectures
Programming Languages
Resource utilization
Scheduling
Technology assessment
title Mitigating interference of microservices with a scoring mechanism in large-scale clusters
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T05%3A26%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mitigating%20interference%20of%20microservices%20with%20a%20scoring%20mechanism%20in%20large-scale%20clusters&rft.jtitle=The%20Journal%20of%20supercomputing&rft.au=Yang,%20Dingyu&rft.date=2025&rft.volume=81&rft.issue=1&rft.artnum=104&rft.issn=0920-8542&rft.eissn=1573-0484&rft_id=info:doi/10.1007/s11227-024-06534-7&rft_dat=%3Cproquest_cross%3E3122603467%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3122603467&rft_id=info:pmid/&rfr_iscdi=true