AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster

Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framew...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Li, Siyuan, Xiao, Youshao, Meng, Fanzhuang, Ju, Lin, Liang, Lei, Wang, Lin, Zhou, Jun
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Li, Siyuan Xiao, Youshao Meng, Fanzhuang Ju, Lin Liang, Lei Wang, Lin Zhou, Jun
description	Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framework, which is specially optimized for the non-dedicated cluster. AntBatchInfer addresses these challenges by providing multi-level fault-tolerant capabilities, enabling the stable execution of versatile and long-running inference tasks. It also improves inference efficiency by pipelining, intra-node, and inter-node scaling. It further optimizes the performance in complicated multiple-model batch inference scenarios. Through extensive experiments and real-world statistics, we demonstrate the superiority of our framework in terms of stability and efficiency. In the experiment, it outperforms the baseline by at least $2\times$ and $6\times$ in the single-model or multiple-model batch inference. Also, it is widely used at Ant Group, with thousands of daily jobs from various scenarios, including DLRM, CV, and NLP, which proves its practicability in the industry.
doi_str_mv	10.48550/arxiv.2404.09686
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_09686</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_09686</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-fe727dcbcf9950d956e72f98523ab2ee68e86f066b57919929f1a878b23f55843</originalsourceid><addsrcrecordid>eNotz7FuwjAUhWEvHSrKA3SqXyDBceJr306FCFoEUhf2yDbXIlJqVY6pytsDaacjfcORfsaeK1E2RimxsOm3_yllI5pSIBh4ZG_LmFc2-9M2BkqvfD3YMfeeT8YnpOiJ95HnE_Hd2VGKlGnk7XAeM6Un9hDsMNL8f2fssFkf2o9i__m-bZf7woKGIpCW-uidD4hKHFHBDQIaJWvrJBEYMhAEgFMaK0SJobJGGyfroJRp6hl7-budErrv1H_ZdOnuKd2UUl8BrHJC5A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster</title><source>arXiv.org</source><creator>Li, Siyuan ; Xiao, Youshao ; Meng, Fanzhuang ; Ju, Lin ; Liang, Lei ; Wang, Lin ; Zhou, Jun</creator><creatorcontrib>Li, Siyuan ; Xiao, Youshao ; Meng, Fanzhuang ; Ju, Lin ; Liang, Lei ; Wang, Lin ; Zhou, Jun</creatorcontrib><description>Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framework, which is specially optimized for the non-dedicated cluster. AntBatchInfer addresses these challenges by providing multi-level fault-tolerant capabilities, enabling the stable execution of versatile and long-running inference tasks. It also improves inference efficiency by pipelining, intra-node, and inter-node scaling. It further optimizes the performance in complicated multiple-model batch inference scenarios. Through extensive experiments and real-world statistics, we demonstrate the superiority of our framework in terms of stability and efficiency. In the experiment, it outperforms the baseline by at least $2\times$ and $6\times$ in the single-model or multiple-model batch inference. Also, it is widely used at Ant Group, with thousands of daily jobs from various scenarios, including DLRM, CV, and NLP, which proves its practicability in the industry.</description><identifier>DOI: 10.48550/arxiv.2404.09686</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Learning</subject><creationdate>2024-04</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.09686$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.09686$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Siyuan</creatorcontrib><creatorcontrib>Xiao, Youshao</creatorcontrib><creatorcontrib>Meng, Fanzhuang</creatorcontrib><creatorcontrib>Ju, Lin</creatorcontrib><creatorcontrib>Liang, Lei</creatorcontrib><creatorcontrib>Wang, Lin</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><title>AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster</title><description>Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framework, which is specially optimized for the non-dedicated cluster. AntBatchInfer addresses these challenges by providing multi-level fault-tolerant capabilities, enabling the stable execution of versatile and long-running inference tasks. It also improves inference efficiency by pipelining, intra-node, and inter-node scaling. It further optimizes the performance in complicated multiple-model batch inference scenarios. Through extensive experiments and real-world statistics, we demonstrate the superiority of our framework in terms of stability and efficiency. In the experiment, it outperforms the baseline by at least $2\times$ and $6\times$ in the single-model or multiple-model batch inference. Also, it is widely used at Ant Group, with thousands of daily jobs from various scenarios, including DLRM, CV, and NLP, which proves its practicability in the industry.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7FuwjAUhWEvHSrKA3SqXyDBceJr306FCFoEUhf2yDbXIlJqVY6pytsDaacjfcORfsaeK1E2RimxsOm3_yllI5pSIBh4ZG_LmFc2-9M2BkqvfD3YMfeeT8YnpOiJ95HnE_Hd2VGKlGnk7XAeM6Un9hDsMNL8f2fssFkf2o9i__m-bZf7woKGIpCW-uidD4hKHFHBDQIaJWvrJBEYMhAEgFMaK0SJobJGGyfroJRp6hl7-budErrv1H_ZdOnuKd2UUl8BrHJC5A</recordid><startdate>20240415</startdate><enddate>20240415</enddate><creator>Li, Siyuan</creator><creator>Xiao, Youshao</creator><creator>Meng, Fanzhuang</creator><creator>Ju, Lin</creator><creator>Liang, Lei</creator><creator>Wang, Lin</creator><creator>Zhou, Jun</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240415</creationdate><title>AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster</title><author>Li, Siyuan ; Xiao, Youshao ; Meng, Fanzhuang ; Ju, Lin ; Liang, Lei ; Wang, Lin ; Zhou, Jun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-fe727dcbcf9950d956e72f98523ab2ee68e86f066b57919929f1a878b23f55843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Siyuan</creatorcontrib><creatorcontrib>Xiao, Youshao</creatorcontrib><creatorcontrib>Meng, Fanzhuang</creatorcontrib><creatorcontrib>Ju, Lin</creatorcontrib><creatorcontrib>Liang, Lei</creatorcontrib><creatorcontrib>Wang, Lin</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Siyuan</au><au>Xiao, Youshao</au><au>Meng, Fanzhuang</au><au>Ju, Lin</au><au>Liang, Lei</au><au>Wang, Lin</au><au>Zhou, Jun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster</atitle><date>2024-04-15</date><risdate>2024</risdate><abstract>Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framework, which is specially optimized for the non-dedicated cluster. AntBatchInfer addresses these challenges by providing multi-level fault-tolerant capabilities, enabling the stable execution of versatile and long-running inference tasks. It also improves inference efficiency by pipelining, intra-node, and inter-node scaling. It further optimizes the performance in complicated multiple-model batch inference scenarios. Through extensive experiments and real-world statistics, we demonstrate the superiority of our framework in terms of stability and efficiency. In the experiment, it outperforms the baseline by at least $2\times$ and $6\times$ in the single-model or multiple-model batch inference. Also, it is widely used at Ant Group, with thousands of daily jobs from various scenarios, including DLRM, CV, and NLP, which proves its practicability in the industry.</abstract><doi>10.48550/arxiv.2404.09686</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2404.09686
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2404_09686
source	arXiv.org
subjects	Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Learning
title	AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T01%3A49%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AntBatchInfer:%20Elastic%20Batch%20Inference%20in%20the%20Kubernetes%20Cluster&rft.au=Li,%20Siyuan&rft.date=2024-04-15&rft_id=info:doi/10.48550/arxiv.2404.09686&rft_dat=%3Carxiv_GOX%3E2404_09686%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true