Adaptive Two-Stage Cloud Resource Scaling via Hierarchical Multi-Indicator Forecasting and Bayesian Decision-Making
The surging demand for cloud computing resources, driven by the rapid growth of sophisticated large-scale models and data centers, underscores the critical importance of efficient and adaptive resource allocation. As major tech enterprises deploy massive infrastructures with thousands of GPUs, exist...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The surging demand for cloud computing resources, driven by the rapid growth
of sophisticated large-scale models and data centers, underscores the critical
importance of efficient and adaptive resource allocation. As major tech
enterprises deploy massive infrastructures with thousands of GPUs, existing
cloud platforms still struggle with low resource utilization due to key
challenges: capturing hierarchical indicator structures, modeling non-Gaussian
distributions, and decision-making under uncertainty. To address these
challenges, we propose HRAMONY, an adaptive Hierarchical Attention-based
Resource Modeling and Decision-Making System. HARMONY combines hierarchical
multi-indicator distribution forecasting and uncertainty-aware Bayesian
decision-making. It introduces a novel hierarchical attention mechanism that
comprehensively models complex inter-indicator dependencies, enabling accurate
predictions that can adapt to evolving environment states. By transforming
Gaussian projections into adaptive non-Gaussian distributions via Normalizing
Flows. Crucially, HARMONY leverages the full predictive distributions in an
adaptive Bayesian process, proactively incorporating uncertainties to optimize
resource allocation while robustly meeting SLA constraints under varying
conditions. Extensive evaluations across four large-scale cloud datasets
demonstrate HARMONY's state-of-the-art performance, significantly outperforming
nine established methods. A month-long real-world deployment validated
HARMONY's substantial practical impact, realizing over 35,000 GPU hours in
savings and translating to $100K+ in cost reduction, showcasing its remarkable
economic value through adaptive, uncertainty-aware scaling. Our code is
available at https://github.com/Floating-LY/HARMONY1. |
---|---|
DOI: | 10.48550/arxiv.2408.01000 |