Resource-Efficient Visual Multiobject Tracking on Embedded Device

Multiobject tracking (MOT) is a crucial technology for security surveillance, which is computationally intensive due to the requirement of processing a large number of video streams within low latency in practice. The input video streams of MOT are processed on a cloud computing center with abundant...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE internet of things journal 2022-06, Vol.9 (11), p.8531-8543
Hauptverfasser:	Tu, Jingzheng, Chen, Cailian, Xu, Qimin, Yang, Bo, Guan, Xinping
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Appearance representation Cloud computing Computational modeling Data models Deep learning Design optimization Edge computing edge video analytics Electronic devices Embedded systems Feature extraction Frames per second Image edge detection Internet of Things Internet of Things (IoT) multiobject tracking (MOT) Multiple target tracking Parallel processing Real time Size reduction Streaming media Surveillance Tracking devices Video data
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multiobject tracking (MOT) is a crucial technology for security surveillance, which is computationally intensive due to the requirement of processing a large number of video streams within low latency in practice. The input video streams of MOT are processed on a cloud computing center with abundant computational capability, posing heavy pressures on delivering video streams to the cloud. Recent advances in the Internet-of-Things (IoT) technology provide edge-computing-based solutions for video analytics at scale. However, the gap between MOT's high computational capability demand and IoT devices' resource-constrained nature remains significant. In this article, a resource-efficient MOT (REMOT) method is proposed for real-time surveillance on IoT embedded devices, including an affinity measurement based on an appearance model with angular triplet loss and a motion association that substitutes the time-consuming graph-based data association stage. Considering the tradeoff between latency and accuracy, we design an optimization strategy on the parallel processing of deep learning models' layers to accelerate the inference speed with less accuracy loss. Besides, we employ a model compression strategy for model size reduction. Experiments on MOT16 and MOT17 benchmarks demonstrate that REMOT reduces 2.4 \times latency compared with the original implementation and achieves a running speed of 81 frames per second (fps) on an embedded device with only a marginal accuracy loss (6%), which meets the requirements of real-time processing and low-latency response for surveillance.
ISSN:	2327-4662 2327-4662
DOI:	10.1109/JIOT.2021.3115102