Using Python for Model Inference in Deep Learning
Python has become the de-facto language for training deep neural networks, coupling a large suite of scientific computing libraries with efficient libraries for tensor computation such as PyTorch or TensorFlow. However, when models are used for inference they are typically extracted from Python as T...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Python has become the de-facto language for training deep neural networks,
coupling a large suite of scientific computing libraries with efficient
libraries for tensor computation such as PyTorch or TensorFlow. However, when
models are used for inference they are typically extracted from Python as
TensorFlow graphs or TorchScript programs in order to meet performance and
packaging constraints. The extraction process can be time consuming, impeding
fast prototyping. We show how it is possible to meet these performance and
packaging constraints while performing inference in Python. In particular, we
present a way of using multiple Python interpreters within a single process to
achieve scalable inference and describe a new container format for models that
contains both native Python code and data. This approach simplifies the model
deployment story by eliminating the model extraction step, and makes it easier
to integrate existing performance-enhancing Python libraries. We evaluate our
design on a suite of popular PyTorch models on Github, showing how they can be
packaged in our inference format, and comparing their performance to
TorchScript. For larger models, our packaged Python models perform the same as
TorchScript, and for smaller models where there is some Python overhead, our
multi-interpreter approach ensures inference is still scalable. |
---|---|
DOI: | 10.48550/arxiv.2104.00254 |