Automated Backend Allocation for Multi-Model, On-Device AI Inference

On-Device Artificial Intelligence (AI) services such as face recognition, object tracking and voice recognition are rapidly scaling up deployments on embedded, memory-constrained hardware devices. These services typically delegate AI inference models for execution on CPU and GPU computing backends....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Performance evaluation review 2024-06, Vol.52 (1), p.27-28
Hauptverfasser:	Iyer, Venkatraman, Lee, Sungho, Lee, Semun, Kim, Juitem Joonwoo, Kim, Hyunjun, Shin, Youngjae
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Computer systems organization Computing methodologies Concurrent algorithms Concurrent computing methodologies Dependable and fault-tolerant systems and networks Discrete space search Search methodologies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	On-Device Artificial Intelligence (AI) services such as face recognition, object tracking and voice recognition are rapidly scaling up deployments on embedded, memory-constrained hardware devices. These services typically delegate AI inference models for execution on CPU and GPU computing backends. While GPU delegation is a common practice to achieve high speed computation, the approach suffers from degraded throughput and completion times under multi-model scenarios, i.e., concurrently executing services. This paper introduces a solution to sustain performance in multi-model, on-device AI contexts by dynamically allocating a combination of CPU and GPU backends per model. The allocation is feedback-driven, and guided by a knowledge of model-specific, multi-objective pareto fronts comprising inference latency and memory consumption. Our backend allocation algorithm that runs online per model, and achieves 25-100% improvement in throughput over static allocations as well as load-balancing scheduler solutions targeting multi-model scenarios.
ISSN:	0163-5999 1557-9484
DOI:	10.1145/3673660.3655046