SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition
This paper proposes a Siamese motion-aware Spatio-temporal network ( SiamMAST ) for video action recognition. The SiamMAST is designed based on the fusion of four features via processing video frames: spatial features, temporal features, spatial dynamic features, and temporal dynamic features of a m...
Gespeichert in:
Veröffentlicht in: | The Visual computer 2024-05, Vol.40 (5), p.3163-3181 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper proposes a Siamese motion-aware Spatio-temporal network (
SiamMAST
) for video action recognition. The
SiamMAST
is designed based on the fusion of four features via processing video frames: spatial features, temporal features, spatial dynamic features, and temporal dynamic features of a moving target. The
SiamMAST
comprises AlexNets as the backbone, LSTMs, and the spatial motion-awareness and temporal motion-awareness sub-modules. RGB images are fed into the network, where AlexNets extract spatial features. Further, they are fed into LSTMs to generate temporal features. Additionally, spatial motion-awareness and temporal motion-awareness sub-modules are proposed to capture spatial and temporal dynamic features. Finally, all features are fused and fed into the classification layer. The final recognition result is produced by averaging the test label probabilities across a fixed number of RGB frames and selecting the label of the highest probability. The whole network is trained offline using an end-to-end approach with large-scale image datasets using the standard SGD algorithm with back-propagation. The proposed network is evaluated on two challenging datasets UCF101 (93.53%) and HMDB51 (69.36%). The experiments have demonstrated the effectiveness and efficiency of our proposed
SiamMAST
. |
---|---|
ISSN: | 0178-2789 1432-2315 |
DOI: | 10.1007/s00371-023-03018-2 |