LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions

Scheduling virtual machines (VMs) to hosts in cloud data centers dictates efficiency and is an NP-hard problem with incomplete information. Prior work improved VM scheduling with predicted VM lifetimes. Our work further improves lifetime-aware scheduling using repredictions with lifetime distributio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ling, Jianheng, Worah, Pratik, Wang, Yawen, Kong, Yunchuan, Wang, Chunlei, Stein, Clifford, Gupta, Diwakar, Behmer, Jason, Bush, Logan A, Ramanan, Prakash, Kumar, Rajesh, Chestna, Thomas, Liu, Yajing, Liu, Ying, Zhao, Ye, McKinley, Kathryn S, Park, Meeyoung, Maas, Martin
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Distributed, Parallel, and Cluster Computing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ling, Jianheng Worah, Pratik Wang, Yawen Kong, Yunchuan Wang, Chunlei Stein, Clifford Gupta, Diwakar Behmer, Jason Bush, Logan A Ramanan, Prakash Kumar, Rajesh Chestna, Thomas Liu, Yajing Liu, Ying Zhao, Ye McKinley, Kathryn S Park, Meeyoung Maas, Martin
description	Scheduling virtual machines (VMs) to hosts in cloud data centers dictates efficiency and is an NP-hard problem with incomplete information. Prior work improved VM scheduling with predicted VM lifetimes. Our work further improves lifetime-aware scheduling using repredictions with lifetime distributions vs. one-shot prediction. The approach repredicts and adjusts VM and host lifetimes when incorrect predictions emerge. We also present novel approaches for defragmentation and regular system maintenance, which are essential to our data center reliability and optimizations, and are unexplored in prior work. We show that repredictions deliver a fundamental advance in effectiveness over one-shot prediction. We call our novel combination of distribution-based lifetime predictions and scheduling algorithms Lifetime Aware VM Allocation (LAVA). LAVA improves resource stranding and the number of empty hosts, which are critical for large VM scheduling, cloud system updates, and reducing dynamic energy consumption. Our approach runs in production within Google's hyperscale cloud data centers, where it improves efficiency by decreasing stranded compute and memory resources by ~3% and ~2% respectively, and increases availability for large VMs and cloud system updates by increasing empty hosts by 2.3-9.2 pp in production. We also show a reduction in VM migrations for host defragmentation and maintenance. In addition to our fleet-wide production deployment, we perform simulation studies to characterize the design space and show that our algorithm significantly outperforms the state of the art lifetime-based scheduling approach.
doi_str_mv	10.48550/arxiv.2412.09840
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_09840</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_09840</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_098403</originalsourceid><addsrcrecordid>eNqFzr0OgjAUhuEuDka9ACfPDYAFIUG3xp84wGYYbY60xJPwl7aK3r0B3Z2-4XuHh7FlwP0oiWO-RvOipx9GQejzbRLxKbumIhc7SKnUjmrtiR6NhjwDUVVtgY7aBnpyd0g1mkYrOJB1hm6P4bGAjQKhsHPf0rWQke2MVlSMwZxNSqysXvx2xlan42V_9kaJ7AzVaN5yEMlRtPlffABaz0FR</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions</title><source>arXiv.org</source><creator>Ling, Jianheng ; Worah, Pratik ; Wang, Yawen ; Kong, Yunchuan ; Wang, Chunlei ; Stein, Clifford ; Gupta, Diwakar ; Behmer, Jason ; Bush, Logan A ; Ramanan, Prakash ; Kumar, Rajesh ; Chestna, Thomas ; Liu, Yajing ; Liu, Ying ; Zhao, Ye ; McKinley, Kathryn S ; Park, Meeyoung ; Maas, Martin</creator><creatorcontrib>Ling, Jianheng ; Worah, Pratik ; Wang, Yawen ; Kong, Yunchuan ; Wang, Chunlei ; Stein, Clifford ; Gupta, Diwakar ; Behmer, Jason ; Bush, Logan A ; Ramanan, Prakash ; Kumar, Rajesh ; Chestna, Thomas ; Liu, Yajing ; Liu, Ying ; Zhao, Ye ; McKinley, Kathryn S ; Park, Meeyoung ; Maas, Martin</creatorcontrib><description>Scheduling virtual machines (VMs) to hosts in cloud data centers dictates efficiency and is an NP-hard problem with incomplete information. Prior work improved VM scheduling with predicted VM lifetimes. Our work further improves lifetime-aware scheduling using repredictions with lifetime distributions vs. one-shot prediction. The approach repredicts and adjusts VM and host lifetimes when incorrect predictions emerge. We also present novel approaches for defragmentation and regular system maintenance, which are essential to our data center reliability and optimizations, and are unexplored in prior work. We show that repredictions deliver a fundamental advance in effectiveness over one-shot prediction. We call our novel combination of distribution-based lifetime predictions and scheduling algorithms Lifetime Aware VM Allocation (LAVA). LAVA improves resource stranding and the number of empty hosts, which are critical for large VM scheduling, cloud system updates, and reducing dynamic energy consumption. Our approach runs in production within Google's hyperscale cloud data centers, where it improves efficiency by decreasing stranded compute and memory resources by ~3% and ~2% respectively, and increases availability for large VMs and cloud system updates by increasing empty hosts by 2.3-9.2 pp in production. We also show a reduction in VM migrations for host defragmentation and maintenance. In addition to our fleet-wide production deployment, we perform simulation studies to characterize the design space and show that our algorithm significantly outperforms the state of the art lifetime-based scheduling approach.</description><identifier>DOI: 10.48550/arxiv.2412.09840</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.09840$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.09840$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ling, Jianheng</creatorcontrib><creatorcontrib>Worah, Pratik</creatorcontrib><creatorcontrib>Wang, Yawen</creatorcontrib><creatorcontrib>Kong, Yunchuan</creatorcontrib><creatorcontrib>Wang, Chunlei</creatorcontrib><creatorcontrib>Stein, Clifford</creatorcontrib><creatorcontrib>Gupta, Diwakar</creatorcontrib><creatorcontrib>Behmer, Jason</creatorcontrib><creatorcontrib>Bush, Logan A</creatorcontrib><creatorcontrib>Ramanan, Prakash</creatorcontrib><creatorcontrib>Kumar, Rajesh</creatorcontrib><creatorcontrib>Chestna, Thomas</creatorcontrib><creatorcontrib>Liu, Yajing</creatorcontrib><creatorcontrib>Liu, Ying</creatorcontrib><creatorcontrib>Zhao, Ye</creatorcontrib><creatorcontrib>McKinley, Kathryn S</creatorcontrib><creatorcontrib>Park, Meeyoung</creatorcontrib><creatorcontrib>Maas, Martin</creatorcontrib><title>LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions</title><description>Scheduling virtual machines (VMs) to hosts in cloud data centers dictates efficiency and is an NP-hard problem with incomplete information. Prior work improved VM scheduling with predicted VM lifetimes. Our work further improves lifetime-aware scheduling using repredictions with lifetime distributions vs. one-shot prediction. The approach repredicts and adjusts VM and host lifetimes when incorrect predictions emerge. We also present novel approaches for defragmentation and regular system maintenance, which are essential to our data center reliability and optimizations, and are unexplored in prior work. We show that repredictions deliver a fundamental advance in effectiveness over one-shot prediction. We call our novel combination of distribution-based lifetime predictions and scheduling algorithms Lifetime Aware VM Allocation (LAVA). LAVA improves resource stranding and the number of empty hosts, which are critical for large VM scheduling, cloud system updates, and reducing dynamic energy consumption. Our approach runs in production within Google's hyperscale cloud data centers, where it improves efficiency by decreasing stranded compute and memory resources by ~3% and ~2% respectively, and increases availability for large VMs and cloud system updates by increasing empty hosts by 2.3-9.2 pp in production. We also show a reduction in VM migrations for host defragmentation and maintenance. In addition to our fleet-wide production deployment, we perform simulation studies to characterize the design space and show that our algorithm significantly outperforms the state of the art lifetime-based scheduling approach.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFzr0OgjAUhuEuDka9ACfPDYAFIUG3xp84wGYYbY60xJPwl7aK3r0B3Z2-4XuHh7FlwP0oiWO-RvOipx9GQejzbRLxKbumIhc7SKnUjmrtiR6NhjwDUVVtgY7aBnpyd0g1mkYrOJB1hm6P4bGAjQKhsHPf0rWQke2MVlSMwZxNSqysXvx2xlan42V_9kaJ7AzVaN5yEMlRtPlffABaz0FR</recordid><startdate>20241212</startdate><enddate>20241212</enddate><creator>Ling, Jianheng</creator><creator>Worah, Pratik</creator><creator>Wang, Yawen</creator><creator>Kong, Yunchuan</creator><creator>Wang, Chunlei</creator><creator>Stein, Clifford</creator><creator>Gupta, Diwakar</creator><creator>Behmer, Jason</creator><creator>Bush, Logan A</creator><creator>Ramanan, Prakash</creator><creator>Kumar, Rajesh</creator><creator>Chestna, Thomas</creator><creator>Liu, Yajing</creator><creator>Liu, Ying</creator><creator>Zhao, Ye</creator><creator>McKinley, Kathryn S</creator><creator>Park, Meeyoung</creator><creator>Maas, Martin</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241212</creationdate><title>LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions</title><author>Ling, Jianheng ; Worah, Pratik ; Wang, Yawen ; Kong, Yunchuan ; Wang, Chunlei ; Stein, Clifford ; Gupta, Diwakar ; Behmer, Jason ; Bush, Logan A ; Ramanan, Prakash ; Kumar, Rajesh ; Chestna, Thomas ; Liu, Yajing ; Liu, Ying ; Zhao, Ye ; McKinley, Kathryn S ; Park, Meeyoung ; Maas, Martin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_098403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Ling, Jianheng</creatorcontrib><creatorcontrib>Worah, Pratik</creatorcontrib><creatorcontrib>Wang, Yawen</creatorcontrib><creatorcontrib>Kong, Yunchuan</creatorcontrib><creatorcontrib>Wang, Chunlei</creatorcontrib><creatorcontrib>Stein, Clifford</creatorcontrib><creatorcontrib>Gupta, Diwakar</creatorcontrib><creatorcontrib>Behmer, Jason</creatorcontrib><creatorcontrib>Bush, Logan A</creatorcontrib><creatorcontrib>Ramanan, Prakash</creatorcontrib><creatorcontrib>Kumar, Rajesh</creatorcontrib><creatorcontrib>Chestna, Thomas</creatorcontrib><creatorcontrib>Liu, Yajing</creatorcontrib><creatorcontrib>Liu, Ying</creatorcontrib><creatorcontrib>Zhao, Ye</creatorcontrib><creatorcontrib>McKinley, Kathryn S</creatorcontrib><creatorcontrib>Park, Meeyoung</creatorcontrib><creatorcontrib>Maas, Martin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ling, Jianheng</au><au>Worah, Pratik</au><au>Wang, Yawen</au><au>Kong, Yunchuan</au><au>Wang, Chunlei</au><au>Stein, Clifford</au><au>Gupta, Diwakar</au><au>Behmer, Jason</au><au>Bush, Logan A</au><au>Ramanan, Prakash</au><au>Kumar, Rajesh</au><au>Chestna, Thomas</au><au>Liu, Yajing</au><au>Liu, Ying</au><au>Zhao, Ye</au><au>McKinley, Kathryn S</au><au>Park, Meeyoung</au><au>Maas, Martin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions</atitle><date>2024-12-12</date><risdate>2024</risdate><abstract>Scheduling virtual machines (VMs) to hosts in cloud data centers dictates efficiency and is an NP-hard problem with incomplete information. Prior work improved VM scheduling with predicted VM lifetimes. Our work further improves lifetime-aware scheduling using repredictions with lifetime distributions vs. one-shot prediction. The approach repredicts and adjusts VM and host lifetimes when incorrect predictions emerge. We also present novel approaches for defragmentation and regular system maintenance, which are essential to our data center reliability and optimizations, and are unexplored in prior work. We show that repredictions deliver a fundamental advance in effectiveness over one-shot prediction. We call our novel combination of distribution-based lifetime predictions and scheduling algorithms Lifetime Aware VM Allocation (LAVA). LAVA improves resource stranding and the number of empty hosts, which are critical for large VM scheduling, cloud system updates, and reducing dynamic energy consumption. Our approach runs in production within Google's hyperscale cloud data centers, where it improves efficiency by decreasing stranded compute and memory resources by ~3% and ~2% respectively, and increases availability for large VMs and cloud system updates by increasing empty hosts by 2.3-9.2 pp in production. We also show a reduction in VM migrations for host defragmentation and maintenance. In addition to our fleet-wide production deployment, we perform simulation studies to characterize the design space and show that our algorithm significantly outperforms the state of the art lifetime-based scheduling approach.</abstract><doi>10.48550/arxiv.2412.09840</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2412.09840
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2412_09840
source	arXiv.org
subjects	Computer Science - Distributed, Parallel, and Cluster Computing
title	LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T01%3A34%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LAVA:%20Lifetime-Aware%20VM%20Allocation%20with%20Learned%20Distributions%20and%20Adaptation%20to%20Mispredictions&rft.au=Ling,%20Jianheng&rft.date=2024-12-12&rft_id=info:doi/10.48550/arxiv.2412.09840&rft_dat=%3Carxiv_GOX%3E2412_09840%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true