EdgeSight: Enabling Modeless and Cost-Efficient Inference at the Edge
Traditional ML inference is evolving toward modeless inference, which abstracts the complexity of model selection from users, allowing the system to automatically choose the most appropriate model for each request based on accuracy and resource requirements. While prior studies have focused on model...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Traditional ML inference is evolving toward modeless inference, which
abstracts the complexity of model selection from users, allowing the system to
automatically choose the most appropriate model for each request based on
accuracy and resource requirements. While prior studies have focused on
modeless inference within data centers, this paper tackles the pressing need
for cost-efficient modeless inference at the edge -- particularly within its
unique constraints of limited device memory, volatile network conditions, and
restricted power consumption.
To overcome these challenges, we propose EdgeSight, a system that provides
cost-efficient EdgeSight serving for diverse DNNs at the edge. EdgeSight
employs an edge-data center (edge-DC) architecture, utilizing confidence
scaling to reduce the number of model options while meeting diverse accuracy
requirements. Additionally, it supports lossy inference in volatile network
environments. Our experimental results show that EdgeSight outperforms existing
systems by up to 1.6x in P99 latency for modeless services. Furthermore, our
FPGA prototype demonstrates similar performance at certain accuracy levels,
with a power consumption reduction of up to 3.34x. |
---|---|
DOI: | 10.48550/arxiv.2405.19213 |