Video Behavior Recognition Model Based on Spatial Long-Distance Modeling Combined With Time-Domain Shift

During the feature extraction study, the video behavior recognition algorithm had a limited ability to extract remote target and time-motion information, resulting in unsatisfactory model classification results. To enhance the network's expression capabilities, this study proposes a video behav...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.99213-99222
Hauptverfasser: Sun, Degang, Zhou, Yanyu, Hu, Zhengping
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:During the feature extraction study, the video behavior recognition algorithm had a limited ability to extract remote target and time-motion information, resulting in unsatisfactory model classification results. To enhance the network's expression capabilities, this study proposes a video behavior recognition algorithm that combines spatial long-distance modeling with a temporal shift. To efficiently extract time-domain motion features in the 2D backbone network, the residual is coupled with the time shift module. At the same time, a narrow and long core, namely 1 \times N or N \times 1 strip pool, is introduced to make the backbone effectively capture the remote information in airspace and obtain the context relations of long-distance targets. Experiments on Something-SomethingV1 and Jester datasets achieve an average recognition accuracy of 45.82% and 96.89%, respectively. The experimental results demonstrate that the proposed algorithm can fully extract time-space features of videos, which establishes certain advantages compared with other existed behavior recognition networks.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3428573