IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency

Efficiently optimizing multi-model inference pipelines for fast, accurate, and cost-effective inference is a crucial challenge in machine learning production systems, given their tight end-to-end latency requirements. To simplify the exploration of the vast and intricate trade-off space of latency,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-01
Hauptverfasser:	Ghafouri, Saeid, Razavi, Kamran, Salmani, Mehran, Sanaee, Alireza, Lorido-Botran, Tania, Wang, Lin, Doyle, Joseph, Jamshidi, Pooyan
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Adaptation Cognitive tasks Deep learning Inference Integer programming Optimization Pipelines System effectiveness Tradeoffs
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!