Memory optimization at Edge for Distributed Convolution Neural Network

Internet of Things (IoT) edge intelligence has emerged by optimizing the deep learning (DL) models deployed on resource‐constraint devices for quick decision‐making. In addition, edge intelligence reduces network overload and latency by bringing intelligent analytics closer to the source. On the oth...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Transactions on emerging telecommunications technologies 2022-12, Vol.33 (12), p.n/a
Hauptverfasser: Naveen, Soumyalatha, Kounte, Manjunath R.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Internet of Things (IoT) edge intelligence has emerged by optimizing the deep learning (DL) models deployed on resource‐constraint devices for quick decision‐making. In addition, edge intelligence reduces network overload and latency by bringing intelligent analytics closer to the source. On the other hand, DL models need a lot of computing resources. As a result, they have high computational workloads and memory footprint, making it impractical to deploy and execute on IoT edge devices with limited capabilities. In addition, existing layer‐based partitioning methods generate many intermediate results, resulting in a huge memory footprint. In this article, we propose a framework to provide a comprehensive solution that enables the deployment of convolutional neural networks (CNNs) onto distributed IoT devices for faster inference and reduced memory footprint. This framework considers a pretrained YOLOv2 model, and a weight pruning technique is applied to the pre‐trained model to reduce the number of non‐contributing parameters. We use the fused layer partitioning method to vertically partition the fused layers of the CNN and then distribute the partition among the edge devices to process the input. In our experiment, we have considered multiple Raspberry Pi as edge devices. Raspberry Pi with a neural computing stick is a gateway device to combine the results from various edge devices and get the final output. Our proposed model achieved inference latency of 5 to ∼$$ \sim $$7 seconds for 3×3$$ 3\times 3 $$ to 5×5$$ 5\times 5 $$ fused layer partitioning for five devices with a 9% improvement in memory footprint. The proposed model provides a comprehensive solution that maps convolutional neural network (CNN) into Internet of Things devices for faster inference and reduces the memory footprint. First, the weight pruning is applied on the CNN pretrained model, followed by fused tile partitioning for distributing tasks to the edge devices for parallel execution.
ISSN:2161-3915
2161-3915
DOI:10.1002/ett.4648