KubeGPU: efficient sharing and isolation mechanisms for GPU resource management in container cloud

With the increasing number of new containerized applications, such as high performance and deep learning applications, started to reply on GPU, efficiently supporting GPU in container cloud becomes essential. While GPU sharing has been extensively studied for VM, limited work has been done for conta...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2023, Vol.79 (1), p.591-625
Hauptverfasser:	Shen, Wenfeng, Liu, Zhengsen, Tan, Yunjie, Luo, Zhaokai, Lei, Zhou
Format:	Artikel
Sprache:	eng
Schlagworte:	Compilers Computer Science Containers Deep learning Interpreters Optimization Performance degradation Processor Architectures Programming Languages Racing Resource management
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	With the increasing number of new containerized applications, such as high performance and deep learning applications, started to reply on GPU, efficiently supporting GPU in container cloud becomes essential. While GPU sharing has been extensively studied for VM, limited work has been done for containers. Existing works only use a single specific GPU virtualization technique to deploy containers, like GPU pass-through or API forwarding, and lack remote GPU virtualization optimization. The limitations lead to low system throughput and container performance degradation due to the dynamic and heterogeneous nature of container resource requirement and GPU virtualization technique, and the problem of communication overhead and resource racing. Therefore, we designed and implemented KubeGPU, which extends Kubernetes to enable GPU sharing with adaptive share strategy. Adaptive sharing strategy gives KubeGPU the ability to make a dynamic choice of GPU virtualization to deploy containers according to available GPU resources and containers’ configuration parameters such as GPU resource requirement in order to achieve a good container performance and system throughput. Besides that, network-aware scheduling approach and fine-grained allocation of remote GPU resources are proposed to optimize remote GPU virtualization. Finally, using representative real-world workloads for HPC and deep learning, we demonstrate the superiority of KubeGPU compared to other existing works, and the effectiveness of KubeGPU in minimizing communication overhead and eliminating remote GPU resource racing.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-022-04682-2