FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge
Edge AI computing boxes are a new class of computing devices that are aimed to revolutionize the AI industry. These compact and robust hardware units bring the power of AI processing directly to the source of data--on the edge of the network. On the other hand, on-demand serverless inference service...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Taki, Sifat Ut Padmanabhan, Arthi Mastorakis, Spyridon |
description | Edge AI computing boxes are a new class of computing devices that are aimed
to revolutionize the AI industry. These compact and robust hardware units bring
the power of AI processing directly to the source of data--on the edge of the
network. On the other hand, on-demand serverless inference services are
becoming more and more popular as they minimize the infrastructural cost
associated with hosting and running DNN models for small to medium-sized
businesses. However, these computing devices are still constrained in terms of
resource availability. As such, the service providers need to load and unload
models efficiently in order to meet the growing demand. In this paper, we
introduce FusedInf to efficiently swap DNN models for on-demand serverless
inference services on the edge. FusedInf combines multiple models into a single
Direct Acyclic Graph (DAG) to efficiently load the models into the GPU memory
and make execution faster. Our evaluation of popular DNN models showed that
creating a single DAG can make the execution of the models up to 14\% faster
while reducing the memory requirement by up to 17\%. The prototype
implementation is available at https://github.com/SifatTaj/FusedInf. |
doi_str_mv | 10.48550/arxiv.2410.21120 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_21120</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_21120</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_211203</originalsourceid><addsrcrecordid>eNqFjr0OgjAURrs4GPUBnLwvAPKbGFeB6CAOuJNKb7EJ3JIWUd9eJO5OX3LyneQwtvY9N9rFsbfl5qUGN4hGEPh-4M3ZLXtYFCeSe0ilVJVC6qF48q5TVIOWkOQ5nLXAxoLUBi7kJNhyElCgGdA0aC2MOhqkCieoKrSgCfo7QipqXLKZ5I3F1W8XbJOl18PRmWrKzqiWm3f5rSqnqvD_4wMqQUGo</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge</title><source>arXiv.org</source><creator>Taki, Sifat Ut ; Padmanabhan, Arthi ; Mastorakis, Spyridon</creator><creatorcontrib>Taki, Sifat Ut ; Padmanabhan, Arthi ; Mastorakis, Spyridon</creatorcontrib><description>Edge AI computing boxes are a new class of computing devices that are aimed
to revolutionize the AI industry. These compact and robust hardware units bring
the power of AI processing directly to the source of data--on the edge of the
network. On the other hand, on-demand serverless inference services are
becoming more and more popular as they minimize the infrastructural cost
associated with hosting and running DNN models for small to medium-sized
businesses. However, these computing devices are still constrained in terms of
resource availability. As such, the service providers need to load and unload
models efficiently in order to meet the growing demand. In this paper, we
introduce FusedInf to efficiently swap DNN models for on-demand serverless
inference services on the edge. FusedInf combines multiple models into a single
Direct Acyclic Graph (DAG) to efficiently load the models into the GPU memory
and make execution faster. Our evaluation of popular DNN models showed that
creating a single DAG can make the execution of the models up to 14\% faster
while reducing the memory requirement by up to 17\%. The prototype
implementation is available at https://github.com/SifatTaj/FusedInf.</description><identifier>DOI: 10.48550/arxiv.2410.21120</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.21120$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.21120$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Taki, Sifat Ut</creatorcontrib><creatorcontrib>Padmanabhan, Arthi</creatorcontrib><creatorcontrib>Mastorakis, Spyridon</creatorcontrib><title>FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge</title><description>Edge AI computing boxes are a new class of computing devices that are aimed
to revolutionize the AI industry. These compact and robust hardware units bring
the power of AI processing directly to the source of data--on the edge of the
network. On the other hand, on-demand serverless inference services are
becoming more and more popular as they minimize the infrastructural cost
associated with hosting and running DNN models for small to medium-sized
businesses. However, these computing devices are still constrained in terms of
resource availability. As such, the service providers need to load and unload
models efficiently in order to meet the growing demand. In this paper, we
introduce FusedInf to efficiently swap DNN models for on-demand serverless
inference services on the edge. FusedInf combines multiple models into a single
Direct Acyclic Graph (DAG) to efficiently load the models into the GPU memory
and make execution faster. Our evaluation of popular DNN models showed that
creating a single DAG can make the execution of the models up to 14\% faster
while reducing the memory requirement by up to 17\%. The prototype
implementation is available at https://github.com/SifatTaj/FusedInf.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjr0OgjAURrs4GPUBnLwvAPKbGFeB6CAOuJNKb7EJ3JIWUd9eJO5OX3LyneQwtvY9N9rFsbfl5qUGN4hGEPh-4M3ZLXtYFCeSe0ilVJVC6qF48q5TVIOWkOQ5nLXAxoLUBi7kJNhyElCgGdA0aC2MOhqkCieoKrSgCfo7QipqXLKZ5I3F1W8XbJOl18PRmWrKzqiWm3f5rSqnqvD_4wMqQUGo</recordid><startdate>20241028</startdate><enddate>20241028</enddate><creator>Taki, Sifat Ut</creator><creator>Padmanabhan, Arthi</creator><creator>Mastorakis, Spyridon</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241028</creationdate><title>FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge</title><author>Taki, Sifat Ut ; Padmanabhan, Arthi ; Mastorakis, Spyridon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_211203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Taki, Sifat Ut</creatorcontrib><creatorcontrib>Padmanabhan, Arthi</creatorcontrib><creatorcontrib>Mastorakis, Spyridon</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Taki, Sifat Ut</au><au>Padmanabhan, Arthi</au><au>Mastorakis, Spyridon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge</atitle><date>2024-10-28</date><risdate>2024</risdate><abstract>Edge AI computing boxes are a new class of computing devices that are aimed
to revolutionize the AI industry. These compact and robust hardware units bring
the power of AI processing directly to the source of data--on the edge of the
network. On the other hand, on-demand serverless inference services are
becoming more and more popular as they minimize the infrastructural cost
associated with hosting and running DNN models for small to medium-sized
businesses. However, these computing devices are still constrained in terms of
resource availability. As such, the service providers need to load and unload
models efficiently in order to meet the growing demand. In this paper, we
introduce FusedInf to efficiently swap DNN models for on-demand serverless
inference services on the edge. FusedInf combines multiple models into a single
Direct Acyclic Graph (DAG) to efficiently load the models into the GPU memory
and make execution faster. Our evaluation of popular DNN models showed that
creating a single DAG can make the execution of the models up to 14\% faster
while reducing the memory requirement by up to 17\%. The prototype
implementation is available at https://github.com/SifatTaj/FusedInf.</abstract><doi>10.48550/arxiv.2410.21120</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2410.21120 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2410_21120 |
source | arXiv.org |
subjects | Computer Science - Learning |
title | FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T10%3A25%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FusedInf:%20Efficient%20Swapping%20of%20DNN%20Models%20for%20On-Demand%20Serverless%20Inference%20Services%20on%20the%20Edge&rft.au=Taki,%20Sifat%20Ut&rft.date=2024-10-28&rft_id=info:doi/10.48550/arxiv.2410.21120&rft_dat=%3Carxiv_GOX%3E2410_21120%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |