An efficient motion visual learning method for video action recognition

Currently, efficient spatio-temporal information modeling is one of the key research components to solve the action recognition problem. Previous approaches focus on enhancing the backbone features individually using hierarchical structures, and unfortunately, most of them fail to achieve a better b...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2024-12, Vol.255, p.124596, Article 124596
Hauptverfasser:	Wang, Bin, Chang, Faliang, Liu, Chunsheng, Wang, Wenqian, Ma, Ruiyi
Format:	Artikel
Sprache:	eng
Schlagworte:	Action recognition Motion expression Multi-dimensional adaptive fusion
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Currently, efficient spatio-temporal information modeling is one of the key research components to solve the action recognition problem. Previous approaches focus on enhancing the backbone features individually using hierarchical structures, and unfortunately, most of them fail to achieve a better balance between the interactional adequacy of features within the structure. In this work, we propose an effective Multi-dimensional Adaptive Fusion Network (MDAF-Net), which can be embedded into the mainstream action recognition backbone in a plug-and-play manner to fully activate the transfer and representation of action features in the deep network. Specifically, our MDAF-Net contains two main components: the Adaptive Temporal Capture Module (ATCM) and the Extended Spatial and Channel Module (ESCM). The ATCM effectively suppresses the over-expression of similar features in adjacent frames and activates the expression of motion flow information. The ESCM further improves temporal modeling efficiency by extending the spatial feature perceptual field and enhancing channel attention. Extensive experiments on several challenging action recognition benchmarks, such as Something-Something V1&V2 and Kinetics-400, demonstrate that the proposed MDAF can achieve state-of-the-art and competitive performance.
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2024.124596