LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions
Scheduling virtual machines (VMs) to hosts in cloud data centers dictates efficiency and is an NP-hard problem with incomplete information. Prior work improved VM scheduling with predicted VM lifetimes. Our work further improves lifetime-aware scheduling using repredictions with lifetime distributio...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Scheduling virtual machines (VMs) to hosts in cloud data centers dictates
efficiency and is an NP-hard problem with incomplete information. Prior work
improved VM scheduling with predicted VM lifetimes. Our work further improves
lifetime-aware scheduling using repredictions with lifetime distributions vs.
one-shot prediction. The approach repredicts and adjusts VM and host lifetimes
when incorrect predictions emerge. We also present novel approaches for
defragmentation and regular system maintenance, which are essential to our data
center reliability and optimizations, and are unexplored in prior work. We show
that repredictions deliver a fundamental advance in effectiveness over one-shot
prediction.
We call our novel combination of distribution-based lifetime predictions and
scheduling algorithms Lifetime Aware VM Allocation (LAVA). LAVA improves
resource stranding and the number of empty hosts, which are critical for large
VM scheduling, cloud system updates, and reducing dynamic energy consumption.
Our approach runs in production within Google's hyperscale cloud data centers,
where it improves efficiency by decreasing stranded compute and memory
resources by ~3% and ~2% respectively, and increases availability for large VMs
and cloud system updates by increasing empty hosts by 2.3-9.2 pp in production.
We also show a reduction in VM migrations for host defragmentation and
maintenance. In addition to our fleet-wide production deployment, we perform
simulation studies to characterize the design space and show that our algorithm
significantly outperforms the state of the art lifetime-based scheduling
approach. |
---|---|
DOI: | 10.48550/arxiv.2412.09840 |