SRNet: Structured Relevance Feature Learning Network from Skeleton Data for Human Action Recognition

In recent years, human action recognition based on skeleton information has recently drawn increasing attention with published large-scale skeleton datasets. The most crucial factors for this task line in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame repres...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2019-01, Vol.7, p.1-1
Hauptverfasser:	Nie, Weizhi, Wang, Wei, Huang, Xiangdong
Format:	Artikel
Sprache:	eng
Schlagworte:	Convolution Convolutional Neural Network Correlation Data models Datasets Deep learning Feature extraction Human action recognition Human activity recognition Human motion Human skeleton information Joints (anatomy) Machine learning Mirrors Representations Skeleton Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In recent years, human action recognition based on skeleton information has recently drawn increasing attention with published large-scale skeleton datasets. The most crucial factors for this task line in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolution. The most effective ways focus on spontaneous feature extraction by using deep learning. However, they ignore the structure information of skeleton joints and the correlation between two different skeleton joints for human action recognition. In this paper, we do not simply treat the joints position information as unordered points. Instead, we propose a novel data reorganizing strategy to represent the global and local structure information of human skeleton joints. Meanwhile, we also employ the data mirror to increase the relationship between skeleton joints. Based on this design, we proposed an end-to-end multi-dimensional CNN network (SRNet) to fully consider the spatial and temporal information to learn the feature extraction transform function. Specifically, in this CNN network, we employ different convolution kernels on different dimensions to learn skeleton representation to make the most of human structural information to generate robust features. Finally, we compare with other state-of-the-art on action recognition datasets like NTU RGB+D, PKU-MMD, SYSU, UT-Kinect, and HDM05. The experimental results also demonstrate the superiority of our method.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2019.2940281