AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster

Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framew...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Li, Siyuan, Xiao, Youshao, Meng, Fanzhuang, Ju, Lin, Liang, Lei, Wang, Lin, Zhou, Jun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Li, Siyuan
Xiao, Youshao
Meng, Fanzhuang
Ju, Lin
Liang, Lei
Wang, Lin
Zhou, Jun
description Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framework, which is specially optimized for the non-dedicated cluster. AntBatchInfer addresses these challenges by providing multi-level fault-tolerant capabilities, enabling the stable execution of versatile and long-running inference tasks. It also improves inference efficiency by pipelining, intra-node, and inter-node scaling. It further optimizes the performance in complicated multiple-model batch inference scenarios. Through extensive experiments and real-world statistics, we demonstrate the superiority of our framework in terms of stability and efficiency. In the experiment, it outperforms the baseline by at least $2\times$ and $6\times$ in the single-model or multiple-model batch inference. Also, it is widely used at Ant Group, with thousands of daily jobs from various scenarios, including DLRM, CV, and NLP, which proves its practicability in the industry.
doi_str_mv 10.48550/arxiv.2404.09686
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_09686</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_09686</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-fe727dcbcf9950d956e72f98523ab2ee68e86f066b57919929f1a878b23f55843</originalsourceid><addsrcrecordid>eNotz7FuwjAUhWEvHSrKA3SqXyDBceJr306FCFoEUhf2yDbXIlJqVY6pytsDaacjfcORfsaeK1E2RimxsOm3_yllI5pSIBh4ZG_LmFc2-9M2BkqvfD3YMfeeT8YnpOiJ95HnE_Hd2VGKlGnk7XAeM6Un9hDsMNL8f2fssFkf2o9i__m-bZf7woKGIpCW-uidD4hKHFHBDQIaJWvrJBEYMhAEgFMaK0SJobJGGyfroJRp6hl7-budErrv1H_ZdOnuKd2UUl8BrHJC5A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster</title><source>arXiv.org</source><creator>Li, Siyuan ; Xiao, Youshao ; Meng, Fanzhuang ; Ju, Lin ; Liang, Lei ; Wang, Lin ; Zhou, Jun</creator><creatorcontrib>Li, Siyuan ; Xiao, Youshao ; Meng, Fanzhuang ; Ju, Lin ; Liang, Lei ; Wang, Lin ; Zhou, Jun</creatorcontrib><description>Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framework, which is specially optimized for the non-dedicated cluster. AntBatchInfer addresses these challenges by providing multi-level fault-tolerant capabilities, enabling the stable execution of versatile and long-running inference tasks. It also improves inference efficiency by pipelining, intra-node, and inter-node scaling. It further optimizes the performance in complicated multiple-model batch inference scenarios. Through extensive experiments and real-world statistics, we demonstrate the superiority of our framework in terms of stability and efficiency. In the experiment, it outperforms the baseline by at least $2\times$ and $6\times$ in the single-model or multiple-model batch inference. Also, it is widely used at Ant Group, with thousands of daily jobs from various scenarios, including DLRM, CV, and NLP, which proves its practicability in the industry.</description><identifier>DOI: 10.48550/arxiv.2404.09686</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Learning</subject><creationdate>2024-04</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.09686$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.09686$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Siyuan</creatorcontrib><creatorcontrib>Xiao, Youshao</creatorcontrib><creatorcontrib>Meng, Fanzhuang</creatorcontrib><creatorcontrib>Ju, Lin</creatorcontrib><creatorcontrib>Liang, Lei</creatorcontrib><creatorcontrib>Wang, Lin</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><title>AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster</title><description>Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framework, which is specially optimized for the non-dedicated cluster. AntBatchInfer addresses these challenges by providing multi-level fault-tolerant capabilities, enabling the stable execution of versatile and long-running inference tasks. It also improves inference efficiency by pipelining, intra-node, and inter-node scaling. It further optimizes the performance in complicated multiple-model batch inference scenarios. Through extensive experiments and real-world statistics, we demonstrate the superiority of our framework in terms of stability and efficiency. In the experiment, it outperforms the baseline by at least $2\times$ and $6\times$ in the single-model or multiple-model batch inference. Also, it is widely used at Ant Group, with thousands of daily jobs from various scenarios, including DLRM, CV, and NLP, which proves its practicability in the industry.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7FuwjAUhWEvHSrKA3SqXyDBceJr306FCFoEUhf2yDbXIlJqVY6pytsDaacjfcORfsaeK1E2RimxsOm3_yllI5pSIBh4ZG_LmFc2-9M2BkqvfD3YMfeeT8YnpOiJ95HnE_Hd2VGKlGnk7XAeM6Un9hDsMNL8f2fssFkf2o9i__m-bZf7woKGIpCW-uidD4hKHFHBDQIaJWvrJBEYMhAEgFMaK0SJobJGGyfroJRp6hl7-budErrv1H_ZdOnuKd2UUl8BrHJC5A</recordid><startdate>20240415</startdate><enddate>20240415</enddate><creator>Li, Siyuan</creator><creator>Xiao, Youshao</creator><creator>Meng, Fanzhuang</creator><creator>Ju, Lin</creator><creator>Liang, Lei</creator><creator>Wang, Lin</creator><creator>Zhou, Jun</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240415</creationdate><title>AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster</title><author>Li, Siyuan ; Xiao, Youshao ; Meng, Fanzhuang ; Ju, Lin ; Liang, Lei ; Wang, Lin ; Zhou, Jun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-fe727dcbcf9950d956e72f98523ab2ee68e86f066b57919929f1a878b23f55843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Siyuan</creatorcontrib><creatorcontrib>Xiao, Youshao</creatorcontrib><creatorcontrib>Meng, Fanzhuang</creatorcontrib><creatorcontrib>Ju, Lin</creatorcontrib><creatorcontrib>Liang, Lei</creatorcontrib><creatorcontrib>Wang, Lin</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Siyuan</au><au>Xiao, Youshao</au><au>Meng, Fanzhuang</au><au>Ju, Lin</au><au>Liang, Lei</au><au>Wang, Lin</au><au>Zhou, Jun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster</atitle><date>2024-04-15</date><risdate>2024</risdate><abstract>Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framework, which is specially optimized for the non-dedicated cluster. AntBatchInfer addresses these challenges by providing multi-level fault-tolerant capabilities, enabling the stable execution of versatile and long-running inference tasks. It also improves inference efficiency by pipelining, intra-node, and inter-node scaling. It further optimizes the performance in complicated multiple-model batch inference scenarios. Through extensive experiments and real-world statistics, we demonstrate the superiority of our framework in terms of stability and efficiency. In the experiment, it outperforms the baseline by at least $2\times$ and $6\times$ in the single-model or multiple-model batch inference. Also, it is widely used at Ant Group, with thousands of daily jobs from various scenarios, including DLRM, CV, and NLP, which proves its practicability in the industry.</abstract><doi>10.48550/arxiv.2404.09686</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2404.09686
ispartof
issn
language eng
recordid cdi_arxiv_primary_2404_09686
source arXiv.org
subjects Computer Science - Distributed, Parallel, and Cluster Computing
Computer Science - Learning
title AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T01%3A49%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AntBatchInfer:%20Elastic%20Batch%20Inference%20in%20the%20Kubernetes%20Cluster&rft.au=Li,%20Siyuan&rft.date=2024-04-15&rft_id=info:doi/10.48550/arxiv.2404.09686&rft_dat=%3Carxiv_GOX%3E2404_09686%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true