Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network

Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2020-07, Vol.30 (7), p.2129-2140
Hauptverfasser: Jiang, Xinghao, Xu, Ke, Sun, Tanfeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2019.2914137