SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition

This paper proposes a Siamese motion-aware Spatio-temporal network ( SiamMAST ) for video action recognition. The SiamMAST is designed based on the fusion of four features via processing video frames: spatial features, temporal features, spatial dynamic features, and temporal dynamic features of a m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Visual computer 2024-05, Vol.40 (5), p.3163-3181
Hauptverfasser:	Lu, Xuemin, Quan, Wei, Marek, Reformat, Zhao, Haiquan, Chen, Jim X.
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Activity recognition Algorithms Artificial Intelligence Color imagery Computer Graphics Computer Science Datasets Design Frames (data processing) Image Processing and Computer Vision Labels Modules Moving targets Neural networks Original Article
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper proposes a Siamese motion-aware Spatio-temporal network ( SiamMAST ) for video action recognition. The SiamMAST is designed based on the fusion of four features via processing video frames: spatial features, temporal features, spatial dynamic features, and temporal dynamic features of a moving target. The SiamMAST comprises AlexNets as the backbone, LSTMs, and the spatial motion-awareness and temporal motion-awareness sub-modules. RGB images are fed into the network, where AlexNets extract spatial features. Further, they are fed into LSTMs to generate temporal features. Additionally, spatial motion-awareness and temporal motion-awareness sub-modules are proposed to capture spatial and temporal dynamic features. Finally, all features are fused and fed into the classification layer. The final recognition result is produced by averaging the test label probabilities across a fixed number of RGB frames and selecting the label of the highest probability. The whole network is trained offline using an end-to-end approach with large-scale image datasets using the standard SGD algorithm with back-propagation. The proposed network is evaluated on two challenging datasets UCF101 (93.53%) and HMDB51 (69.36%). The experiments have demonstrated the effectiveness and efficiency of our proposed SiamMAST .
ISSN:	0178-2789 1432-2315
DOI:	10.1007/s00371-023-03018-2