Video Scene Segmentation Using Tensor-Train Faster-RCNN for Multimedia IoT Systems

Video surveillance techniques like scene segmentation are playing an increasingly important role in multimedia Internet-of-Things (IoT) systems. However, existing deep learning-based methods face challenges in both accuracy and memory when deployed on edge computing devices with limited computing re...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE internet of things journal 2021-06, Vol.8 (12), p.9697-9705
Hauptverfasser: Dai, Cheng, Liu, Xingang, Yang, Laurence T., Ni, Minghao, Ma, Zhenchao, Zhang, Qingchen, Deen, M. Jamal
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Video surveillance techniques like scene segmentation are playing an increasingly important role in multimedia Internet-of-Things (IoT) systems. However, existing deep learning-based methods face challenges in both accuracy and memory when deployed on edge computing devices with limited computing resources. To address these challenges, a tensor-train video scene segmentation scheme that compares the local background information in regional scene boundary boxes in adjacent frames is proposed. Compared to the existing methods, the proposed scheme can achieve competitive performance in both segmentation accuracy and parameter compression rate. In detail, first, an improved faster region convolutional neural network (faster-RCNN) model is proposed to recognize and generate a large number of region boxes with foreground and background to achieve boundary boxes. Then, the foreground boxes with sparse objects are removed and the rest are considered as optional background boxes used to measure the similarity between two adjacent frames. Second, to accelerate the training efficiency and reduce memory size, a general and efficient training way using tensor-train decomposition to factor the input-to-hidden weight matrix is proposed. Finally, experiments are conducted to evaluate the performance of the proposed scheme in terms of accuracy and model compression. Our results demonstrate that the proposed model can improve the training efficiency and save the memory space for the deep computation model with good accuracy. This work opens the potential for the use of artificial intelligence methods in edge computing devices for multimedia IoT systems.
ISSN:2327-4662
2327-4662
DOI:10.1109/JIOT.2020.3022353