Continuous sign language recognition based on iterative alignment network and attention mechanism

The biggest challenge of continuous sign language recognition is the weak supervision of sign language labels. This paper proposes a continuous sign language recognition framework based on iterative alignment network and attention mechanism to solve this problem. The iterative alignment network uses...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2023-05, Vol.82 (11), p.17195-17212
Hauptverfasser:	Xue, Cuihong, Yu, Ming, Yan, Gang, Gao, Yang, Liu, Yuehao
Format:	Artikel
Sprache:	eng
Schlagworte:	Alignment Coders Computer Communication Networks Computer Science Data Structures and Information Theory Decoding Encoders-Decoders Labels Machine learning Modules Multimedia Information Systems Neural networks Parameters Recognition Sign language Special Purpose and Application-Based Systems Supervised learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The biggest challenge of continuous sign language recognition is the weak supervision of sign language labels. This paper proposes a continuous sign language recognition framework based on iterative alignment network and attention mechanism to solve this problem. The iterative alignment network uses a spatial-temporal residual network (STRN) to extract block-level features, a temporal convolutional module (TCM) to enhance the temporal correlation between block-level features, and Bidirectional Gated neural network(BGRU) and connectionist temporal classification (CTC) to generate pseudo-labels for each block-level feature; this in turn performs strong supervised learning with STRN and TCM, optimizes the network parameters, and uses CTC to learn new mapping relationships based on the optimized parameters in the next iteration. Then, the word-level features generated by the iterative alignment network are input into the encoder-decoder network, which is based on an attention mechanism. The attention module is used to fully pay attention to the relevant time-step information of the input feature sequence during decoding to obtain more accurate decoding results. The method is evaluated experimentally on three large-scale continuous sign language data sets (RWTH-Phoenix-Weather 2014, CSL and CSL daily), and the experimental results prove the method’s effectiveness.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-022-14085-3