Video Behavior Recognition Model Based on Spatial Long-Distance Modeling Combined With Time-Domain Shift
During the feature extraction study, the video behavior recognition algorithm had a limited ability to extract remote target and time-motion information, resulting in unsatisfactory model classification results. To enhance the network's expression capabilities, this study proposes a video behav...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.99213-99222 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | During the feature extraction study, the video behavior recognition algorithm had a limited ability to extract remote target and time-motion information, resulting in unsatisfactory model classification results. To enhance the network's expression capabilities, this study proposes a video behavior recognition algorithm that combines spatial long-distance modeling with a temporal shift. To efficiently extract time-domain motion features in the 2D backbone network, the residual is coupled with the time shift module. At the same time, a narrow and long core, namely 1 \times N or N \times 1 strip pool, is introduced to make the backbone effectively capture the remote information in airspace and obtain the context relations of long-distance targets. Experiments on Something-SomethingV1 and Jester datasets achieve an average recognition accuracy of 45.82% and 96.89%, respectively. The experimental results demonstrate that the proposed algorithm can fully extract time-space features of videos, which establishes certain advantages compared with other existed behavior recognition networks. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3428573 |