Fine-grained traffic video vehicle recognition based orientation estimation and temporal information

In this paper, we propose a method for fine-grained vehicle recognition in traffic surveillance video. Compared with general theory about single image fine-grained recognition, this method focuses on multi-frame information combination and the viewpoint changes across videos. Firstly, we detect vehi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2023-04, Vol.82 (9), p.13745-13763
Hauptverfasser:	Hu, Anqi, Sun, Zhengxing, Li, Qian, Xu, Yechao, Zhu, Yihuan, Zhang, Sheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Chassis Computer Communication Networks Computer Science Data Structures and Information Theory Datasets Frames (data processing) Multimedia Information Systems Orientation Performance evaluation Pose estimation Recurrent neural networks Spatial data Special Purpose and Application-Based Systems Surveillance Traffic information Traffic surveillance
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we propose a method for fine-grained vehicle recognition in traffic surveillance video. Compared with general theory about single image fine-grained recognition, this method focuses on multi-frame information combination and the viewpoint changes across videos. Firstly, we detect vehicle instances and their local frames in input traffic video by vehicle tracking. For each vehicle instance, pose estimation is used to extract the 3D orientation in corresponding frame. We encode the 3D orientation as an extra supervising clue, and merge it with CNN feature to show the appearance information and changes in moving process. In addition, recurrent neural network (RNN) is proposed to select abundant information over traffic video and fuse CNN feature of each vehicle frames into comprehensive feature which includes not only spatial information but also temporal information for fine-grained recognition. We do our experiments on the personal CarVideo dataset which collected by surveillance cameras and the open dataset BoxCar116k for performance evaluation. The experiments show that our method outperforms the state-of-the-art methods for fine-grained recognition in traffic video application.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-022-13811-1