Improved Feature Extraction and Similarity Algorithm for Video Object Detection

Video object detection is an important research direction of computer vision. The task of video object detection is to detect and classify moving objects in a sequence of images. Based on the static image object detector, most of the existing video object detection methods use the unique temporal co...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information (Basel) 2023-02, Vol.14 (2), p.115
Hauptverfasser:	You, Haotian, Lu, Yufang, Tang, Haihua
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Agglomeration Algorithms Analysis Clustering Computer vision Datasets Deep learning Detectors dynamic region aware convolution faster RCNN Feature extraction feature pyramid Frames (data processing) Image classification Machine vision Methods Modules Moving object recognition Occlusion Optical flow (image analysis) Proposals Redundancy Semantics Similarity similarity algorithms video object detection
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Video object detection is an important research direction of computer vision. The task of video object detection is to detect and classify moving objects in a sequence of images. Based on the static image object detector, most of the existing video object detection methods use the unique temporal correlation of video to solve the problem of missed detection and false detection caused by moving object occlusion and blur. Another video object detection model guided by an optical flow network is widely used. Feature aggregation of adjacent frames is performed by estimating the optical flow field. However, there are many redundant computations for feature aggregation of adjacent frames. To begin with, this paper improved Faster RCNN by Feature Pyramid and Dynamic Region Aware Convolution. Then the S-SELSA module is proposed from the perspective of semantic and feature similarity. Feature similarity is obtained by a modified SSIM algorithm. The module can aggregate the features of frames globally to avoid redundancy. Finally, the experimental results on the ImageNet VID and DET datasets show that the mAP of the method proposed in this paper is 83.55%, which is higher than the existing methods.
ISSN:	2078-2489 2078-2489
DOI:	10.3390/info14020115