EdgeSight: Enabling Modeless and Cost-Efficient Inference at the Edge

Traditional ML inference is evolving toward modeless inference, which abstracts the complexity of model selection from users, allowing the system to automatically choose the most appropriate model for each request based on accuracy and resource requirements. While prior studies have focused on model...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lao, ChonLam, Gao, Jiaqi, Ananthanarayanan, Ganesh, Akella, Aditya, Yu, Minlan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning Computer Science - Networking and Internet Architecture Computer Science - Systems and Control
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Traditional ML inference is evolving toward modeless inference, which abstracts the complexity of model selection from users, allowing the system to automatically choose the most appropriate model for each request based on accuracy and resource requirements. While prior studies have focused on modeless inference within data centers, this paper tackles the pressing need for cost-efficient modeless inference at the edge -- particularly within its unique constraints of limited device memory, volatile network conditions, and restricted power consumption. To overcome these challenges, we propose EdgeSight, a system that provides cost-efficient EdgeSight serving for diverse DNNs at the edge. EdgeSight employs an edge-data center (edge-DC) architecture, utilizing confidence scaling to reduce the number of model options while meeting diverse accuracy requirements. Additionally, it supports lossy inference in volatile network environments. Our experimental results show that EdgeSight outperforms existing systems by up to 1.6x in P99 latency for modeless services. Furthermore, our FPGA prototype demonstrates similar performance at certain accuracy levels, with a power consumption reduction of up to 3.34x.
DOI:	10.48550/arxiv.2405.19213