FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge

Edge AI computing boxes are a new class of computing devices that are aimed to revolutionize the AI industry. These compact and robust hardware units bring the power of AI processing directly to the source of data--on the edge of the network. On the other hand, on-demand serverless inference service...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Taki, Sifat Ut, Padmanabhan, Arthi, Mastorakis, Spyridon
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Taki, Sifat Ut Padmanabhan, Arthi Mastorakis, Spyridon
description	Edge AI computing boxes are a new class of computing devices that are aimed to revolutionize the AI industry. These compact and robust hardware units bring the power of AI processing directly to the source of data--on the edge of the network. On the other hand, on-demand serverless inference services are becoming more and more popular as they minimize the infrastructural cost associated with hosting and running DNN models for small to medium-sized businesses. However, these computing devices are still constrained in terms of resource availability. As such, the service providers need to load and unload models efficiently in order to meet the growing demand. In this paper, we introduce FusedInf to efficiently swap DNN models for on-demand serverless inference services on the edge. FusedInf combines multiple models into a single Direct Acyclic Graph (DAG) to efficiently load the models into the GPU memory and make execution faster. Our evaluation of popular DNN models showed that creating a single DAG can make the execution of the models up to 14\% faster while reducing the memory requirement by up to 17\%. The prototype implementation is available at https://github.com/SifatTaj/FusedInf.
doi_str_mv	10.48550/arxiv.2410.21120
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_21120</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_21120</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_211203</originalsourceid><addsrcrecordid>eNqFjr0OgjAURrs4GPUBnLwvAPKbGFeB6CAOuJNKb7EJ3JIWUd9eJO5OX3LyneQwtvY9N9rFsbfl5qUGN4hGEPh-4M3ZLXtYFCeSe0ilVJVC6qF48q5TVIOWkOQ5nLXAxoLUBi7kJNhyElCgGdA0aC2MOhqkCieoKrSgCfo7QipqXLKZ5I3F1W8XbJOl18PRmWrKzqiWm3f5rSqnqvD_4wMqQUGo</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge</title><source>arXiv.org</source><creator>Taki, Sifat Ut ; Padmanabhan, Arthi ; Mastorakis, Spyridon</creator><creatorcontrib>Taki, Sifat Ut ; Padmanabhan, Arthi ; Mastorakis, Spyridon</creatorcontrib><description>Edge AI computing boxes are a new class of computing devices that are aimed to revolutionize the AI industry. These compact and robust hardware units bring the power of AI processing directly to the source of data--on the edge of the network. On the other hand, on-demand serverless inference services are becoming more and more popular as they minimize the infrastructural cost associated with hosting and running DNN models for small to medium-sized businesses. However, these computing devices are still constrained in terms of resource availability. As such, the service providers need to load and unload models efficiently in order to meet the growing demand. In this paper, we introduce FusedInf to efficiently swap DNN models for on-demand serverless inference services on the edge. FusedInf combines multiple models into a single Direct Acyclic Graph (DAG) to efficiently load the models into the GPU memory and make execution faster. Our evaluation of popular DNN models showed that creating a single DAG can make the execution of the models up to 14\% faster while reducing the memory requirement by up to 17\%. The prototype implementation is available at https://github.com/SifatTaj/FusedInf.</description><identifier>DOI: 10.48550/arxiv.2410.21120</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.21120$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.21120$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Taki, Sifat Ut</creatorcontrib><creatorcontrib>Padmanabhan, Arthi</creatorcontrib><creatorcontrib>Mastorakis, Spyridon</creatorcontrib><title>FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge</title><description>Edge AI computing boxes are a new class of computing devices that are aimed to revolutionize the AI industry. These compact and robust hardware units bring the power of AI processing directly to the source of data--on the edge of the network. On the other hand, on-demand serverless inference services are becoming more and more popular as they minimize the infrastructural cost associated with hosting and running DNN models for small to medium-sized businesses. However, these computing devices are still constrained in terms of resource availability. As such, the service providers need to load and unload models efficiently in order to meet the growing demand. In this paper, we introduce FusedInf to efficiently swap DNN models for on-demand serverless inference services on the edge. FusedInf combines multiple models into a single Direct Acyclic Graph (DAG) to efficiently load the models into the GPU memory and make execution faster. Our evaluation of popular DNN models showed that creating a single DAG can make the execution of the models up to 14\% faster while reducing the memory requirement by up to 17\%. The prototype implementation is available at https://github.com/SifatTaj/FusedInf.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjr0OgjAURrs4GPUBnLwvAPKbGFeB6CAOuJNKb7EJ3JIWUd9eJO5OX3LyneQwtvY9N9rFsbfl5qUGN4hGEPh-4M3ZLXtYFCeSe0ilVJVC6qF48q5TVIOWkOQ5nLXAxoLUBi7kJNhyElCgGdA0aC2MOhqkCieoKrSgCfo7QipqXLKZ5I3F1W8XbJOl18PRmWrKzqiWm3f5rSqnqvD_4wMqQUGo</recordid><startdate>20241028</startdate><enddate>20241028</enddate><creator>Taki, Sifat Ut</creator><creator>Padmanabhan, Arthi</creator><creator>Mastorakis, Spyridon</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241028</creationdate><title>FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge</title><author>Taki, Sifat Ut ; Padmanabhan, Arthi ; Mastorakis, Spyridon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_211203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Taki, Sifat Ut</creatorcontrib><creatorcontrib>Padmanabhan, Arthi</creatorcontrib><creatorcontrib>Mastorakis, Spyridon</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Taki, Sifat Ut</au><au>Padmanabhan, Arthi</au><au>Mastorakis, Spyridon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge</atitle><date>2024-10-28</date><risdate>2024</risdate><abstract>Edge AI computing boxes are a new class of computing devices that are aimed to revolutionize the AI industry. These compact and robust hardware units bring the power of AI processing directly to the source of data--on the edge of the network. On the other hand, on-demand serverless inference services are becoming more and more popular as they minimize the infrastructural cost associated with hosting and running DNN models for small to medium-sized businesses. However, these computing devices are still constrained in terms of resource availability. As such, the service providers need to load and unload models efficiently in order to meet the growing demand. In this paper, we introduce FusedInf to efficiently swap DNN models for on-demand serverless inference services on the edge. FusedInf combines multiple models into a single Direct Acyclic Graph (DAG) to efficiently load the models into the GPU memory and make execution faster. Our evaluation of popular DNN models showed that creating a single DAG can make the execution of the models up to 14\% faster while reducing the memory requirement by up to 17\%. The prototype implementation is available at https://github.com/SifatTaj/FusedInf.</abstract><doi>10.48550/arxiv.2410.21120</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2410.21120
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2410_21120
source	arXiv.org
subjects	Computer Science - Learning
title	FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T10%3A25%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FusedInf:%20Efficient%20Swapping%20of%20DNN%20Models%20for%20On-Demand%20Serverless%20Inference%20Services%20on%20the%20Edge&rft.au=Taki,%20Sifat%20Ut&rft.date=2024-10-28&rft_id=info:doi/10.48550/arxiv.2410.21120&rft_dat=%3Carxiv_GOX%3E2410_21120%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true