Arcus: SLO Management for Accelerators in the Cloud with Traffic Shaping
Cloud servers use accelerators for common tasks (e.g., encryption, compression, hashing) to improve CPU/GPU efficiency and overall performance. However, users' Service-level Objectives (SLOs) can be violated due to accelerator-related contention. The root cause is that existing solutions for ac...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Cloud servers use accelerators for common tasks (e.g., encryption,
compression, hashing) to improve CPU/GPU efficiency and overall performance.
However, users' Service-level Objectives (SLOs) can be violated due to
accelerator-related contention. The root cause is that existing solutions for
accelerators only focus on isolation or fair allocation of compute and memory
resources; they overlook the contention for communication-related resources.
Specifically, three communication-induced challenges drive us to re-think the
problem: (1) Accelerator traffic patterns are diverse, hard to predict, and
mixed across users, (2) communication-related components lack effective
low-level isolation mechanism to configure, and (3) computational heterogeneity
of accelerators lead to unique relationships between the traffic mixture and
the corresponding accelerator performance. The focus of this work is meeting
SLOs in accelerator-rich systems. We present \design{}, treating accelerator
SLO management as traffic management with proactive traffic shaping. We develop
an SLO-aware protocol coupled with an offloaded interface on an architecture
that supports precise and scalable traffic shaping. We guarantee accelerator
SLO for various circumstances, with up to 45% tail latency reduction and less
than 1% throughput variance. |
---|---|
DOI: | 10.48550/arxiv.2410.17577 |