Sequential Gesture Learning for Continuous Labanotation Generation Based on the Fusion of Graph Neural Networks

Labanotation is a symbolic recording system for human movements, and also a powerful tool for protecting and spreading folk dances and other performing arts. State-of-the-art automatic Labanotation uses end-to-end methods with sequence-based skeleton representation, which cannot capture the relation...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2022-06, Vol.32 (6), p.3722-3734
Hauptverfasser: Xie, Ningwei, Miao, Zhenjiang, Zhang, Xiao-Ping, Xu, Wanru, Li, Min, Wang, Jiaji
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Labanotation is a symbolic recording system for human movements, and also a powerful tool for protecting and spreading folk dances and other performing arts. State-of-the-art automatic Labanotation uses end-to-end methods with sequence-based skeleton representation, which cannot capture the relationship between joints and bones in the skeleton for accurate descriptions of continuous lower limb movements such as dance steps. In this paper, we propose a novel double-stream fusion method of directed graph neural networks (DGNN), combined with connectionist temporal classification (CTC), namely DFGNN-CTC, for sequential fine-grained motion recognition, such as the Labanotation generation of unsegmented dance movement. First, we extract double-stream directed graph feature, employing an orientation-normalized directed acyclic graph (ON-DAG) and an orientation-normalized temporal directed acyclic graph (ON-TDAG), to jointly model spatiotemporal properties of movement recorded in motion capture data. Then, we design a CTC-based fusion-pooling module to fuse the spatial and temporal streams encoded by two DGNNs. It concatenates and fuses the two streams to generate discriminative descriptions of each time step, and concentrates them to make per-time-step predictions of Laban gesture type, from which the CTC searches the optimal Laban symbol sequence, corresponding to elemental motions composing the movement. In this way, the new method enables much finer discrimination for similar Laban gestures with subtle differences in spatial and temporal properties through joint contextual spatiotemporal modeling so that it achieves much superior performance in continuous Labanotation generation to existing methods, which only have single-stream analysis either spatially or temporally. The experiments on two Labanotation-labelled motion capture datasets demonstrate the effectiveness of the components in the proposed method and its superiority comparing with the state-of-the-art methods, especially for lower limb movements.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2021.3109892