Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning

The inference of large-sized images on Internet of Things (IoT) devices is commonly hindered by limited resources, while there are often stringent latency requirements for Deep Neural Network (DNN) inference. Currently, this problem is generally addressed by collaborative inference, where the large-...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Liu, Zhibang, Xu, Chaonong, Lv, Zhenjie, Liu, Zhizhuo, Zhao, Suyu
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Distributed, Parallel, and Cluster Computing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Liu, Zhibang Xu, Chaonong Lv, Zhenjie Liu, Zhizhuo Zhao, Suyu
description	The inference of large-sized images on Internet of Things (IoT) devices is commonly hindered by limited resources, while there are often stringent latency requirements for Deep Neural Network (DNN) inference. Currently, this problem is generally addressed by collaborative inference, where the large-sized image is partitioned into multiple tiles, and each tile is assigned to an IoT device for processing. However, since significant latency will be incurred due to the communication overhead caused by tile sharing, the existing collaborative inference strategy is inefficient for convolutional computation, which is indispensable for any DNN. To reduce it, we propose Non-Penetrative Tensor Partitioning (NPTP), a fine-grained tensor partitioning method that reduces the communication latency by minimizing the communication load of tiles shared, thereby reducing inference latency. We evaluate NPTP with four widely-adopted DNN models. Experimental results demonstrate that NPTP achieves a 1.44-1.68x inference speedup relative to CoEdge, a state-of-the-art (SOTA) collaborative inference algorithm.
doi_str_mv	10.48550/arxiv.2501.04489
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2501_04489</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2501_04489</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2501_044893</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjUw1DMwMbGw5GTwdc7PyUlMyi9KLMksS1XwzEtLLUrNS05VcExOTs1JBQnn5ymUZ5ZkKPjl5-kGpOallkDVhqTmFecXKQQkFpVkglRl5qXzMLCmJeYUp_JCaW4GeTfXEGcPXbDF8QVFmbmJRZXxIAfEgx1gTFgFAJDBPFI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning</title><source>arXiv.org</source><creator>Liu, Zhibang ; Xu, Chaonong ; Lv, Zhenjie ; Liu, Zhizhuo ; Zhao, Suyu</creator><creatorcontrib>Liu, Zhibang ; Xu, Chaonong ; Lv, Zhenjie ; Liu, Zhizhuo ; Zhao, Suyu</creatorcontrib><description>The inference of large-sized images on Internet of Things (IoT) devices is commonly hindered by limited resources, while there are often stringent latency requirements for Deep Neural Network (DNN) inference. Currently, this problem is generally addressed by collaborative inference, where the large-sized image is partitioned into multiple tiles, and each tile is assigned to an IoT device for processing. However, since significant latency will be incurred due to the communication overhead caused by tile sharing, the existing collaborative inference strategy is inefficient for convolutional computation, which is indispensable for any DNN. To reduce it, we propose Non-Penetrative Tensor Partitioning (NPTP), a fine-grained tensor partitioning method that reduces the communication latency by minimizing the communication load of tiles shared, thereby reducing inference latency. We evaluate NPTP with four widely-adopted DNN models. Experimental results demonstrate that NPTP achieves a 1.44-1.68x inference speedup relative to CoEdge, a state-of-the-art (SOTA) collaborative inference algorithm.</description><identifier>DOI: 10.48550/arxiv.2501.04489</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><creationdate>2025-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2501.04489$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2501.04489$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Zhibang</creatorcontrib><creatorcontrib>Xu, Chaonong</creatorcontrib><creatorcontrib>Lv, Zhenjie</creatorcontrib><creatorcontrib>Liu, Zhizhuo</creatorcontrib><creatorcontrib>Zhao, Suyu</creatorcontrib><title>Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning</title><description>The inference of large-sized images on Internet of Things (IoT) devices is commonly hindered by limited resources, while there are often stringent latency requirements for Deep Neural Network (DNN) inference. Currently, this problem is generally addressed by collaborative inference, where the large-sized image is partitioned into multiple tiles, and each tile is assigned to an IoT device for processing. However, since significant latency will be incurred due to the communication overhead caused by tile sharing, the existing collaborative inference strategy is inefficient for convolutional computation, which is indispensable for any DNN. To reduce it, we propose Non-Penetrative Tensor Partitioning (NPTP), a fine-grained tensor partitioning method that reduces the communication latency by minimizing the communication load of tiles shared, thereby reducing inference latency. We evaluate NPTP with four widely-adopted DNN models. Experimental results demonstrate that NPTP achieves a 1.44-1.68x inference speedup relative to CoEdge, a state-of-the-art (SOTA) collaborative inference algorithm.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjUw1DMwMbGw5GTwdc7PyUlMyi9KLMksS1XwzEtLLUrNS05VcExOTs1JBQnn5ymUZ5ZkKPjl5-kGpOallkDVhqTmFecXKQQkFpVkglRl5qXzMLCmJeYUp_JCaW4GeTfXEGcPXbDF8QVFmbmJRZXxIAfEgx1gTFgFAJDBPFI</recordid><startdate>20250108</startdate><enddate>20250108</enddate><creator>Liu, Zhibang</creator><creator>Xu, Chaonong</creator><creator>Lv, Zhenjie</creator><creator>Liu, Zhizhuo</creator><creator>Zhao, Suyu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20250108</creationdate><title>Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning</title><author>Liu, Zhibang ; Xu, Chaonong ; Lv, Zhenjie ; Liu, Zhizhuo ; Zhao, Suyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2501_044893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Zhibang</creatorcontrib><creatorcontrib>Xu, Chaonong</creatorcontrib><creatorcontrib>Lv, Zhenjie</creatorcontrib><creatorcontrib>Liu, Zhizhuo</creatorcontrib><creatorcontrib>Zhao, Suyu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Zhibang</au><au>Xu, Chaonong</au><au>Lv, Zhenjie</au><au>Liu, Zhizhuo</au><au>Zhao, Suyu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning</atitle><date>2025-01-08</date><risdate>2025</risdate><abstract>The inference of large-sized images on Internet of Things (IoT) devices is commonly hindered by limited resources, while there are often stringent latency requirements for Deep Neural Network (DNN) inference. Currently, this problem is generally addressed by collaborative inference, where the large-sized image is partitioned into multiple tiles, and each tile is assigned to an IoT device for processing. However, since significant latency will be incurred due to the communication overhead caused by tile sharing, the existing collaborative inference strategy is inefficient for convolutional computation, which is indispensable for any DNN. To reduce it, we propose Non-Penetrative Tensor Partitioning (NPTP), a fine-grained tensor partitioning method that reduces the communication latency by minimizing the communication load of tiles shared, thereby reducing inference latency. We evaluate NPTP with four widely-adopted DNN models. Experimental results demonstrate that NPTP achieves a 1.44-1.68x inference speedup relative to CoEdge, a state-of-the-art (SOTA) collaborative inference algorithm.</abstract><doi>10.48550/arxiv.2501.04489</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2501.04489
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2501_04489
source	arXiv.org
subjects	Computer Science - Distributed, Parallel, and Cluster Computing
title	Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T19%3A37%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Collaborative%20Inference%20Acceleration%20with%20Non-Penetrative%20Tensor%20Partitioning&rft.au=Liu,%20Zhibang&rft.date=2025-01-08&rft_id=info:doi/10.48550/arxiv.2501.04489&rft_dat=%3Carxiv_GOX%3E2501_04489%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true