Multi-modality learning for human action recognition

The multi-modality based human action recognition is an increasing topic. Multi-modality can provide more abundant and complementary information than single modality. However, it is difficult for multi-modality learning to capture the spatial-temporal information from the entire RGB and depth sequen...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2021-05, Vol.80 (11), p.16185-16203
Hauptverfasser:	Ren, Ziliang, Zhang, Qieshi, Gao, Xiangyang, Hao, Pengyi, Cheng, Jun
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Communication Networks Computer Science Data Structures and Information Theory Design Human activity recognition Human motion Image segmentation Learning Methods Multimedia Multimedia Information Systems Neural networks Special Purpose and Application-Based Systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The multi-modality based human action recognition is an increasing topic. Multi-modality can provide more abundant and complementary information than single modality. However, it is difficult for multi-modality learning to capture the spatial-temporal information from the entire RGB and depth sequence effectively. In this paper, to obtain better representation of spatial-temporal information, we propose a bidirectional rank pooling method to construct the RGB Visual Dynamic Images (VDIs) and Depth Dynamic Images (DDIs). Furthermore, we design an effective segmentation convolutional networks (ConvNets) architecture based on multi-modality hierarchical fusion strategy for human action recognition. The proposed method has been verified and achieved the state-of-the-art results on the widely used NTU RGB+D, SYSU 3D HOI and UWA3D II datasets.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-019-08576-z