Multi-Information Spatial-Temporal LSTM Fusion Continuous Sign Language Neural Machine Translation

There are two basic problems in sign language recognition (SLR): (a) isolated word SLR and (b) continuous SLR. Most of the existing continuous SLR methods are extensions of the isolated word SLR methods. These methods use the isolated word SLR results as the basic module and obtain the sentence reco...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2020, Vol.8, p.216718-216728
Hauptverfasser:	Xiao, Qinkun, Chang, Xin, Zhang, Xue, Liu, Xing
Format:	Artikel
Sprache:	eng
Schlagworte:	Alignment Artificial neural networks Assistive technology attention CNN Continuous SLR Convolution Data integration Datasets Feature extraction Gesture recognition Hidden Markov models Kernel Labelling Labels Machine translation neural machine translation Segmentation Sign language ST-LSTM Task analysis Video data Weather Words (language)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	There are two basic problems in sign language recognition (SLR): (a) isolated word SLR and (b) continuous SLR. Most of the existing continuous SLR methods are extensions of the isolated word SLR methods. These methods use the isolated word SLR results as the basic module and obtain the sentence recognition results through sentence segmentation and word alignment. However, sentence segmentation and word alignment are often not accurate, resulting in a low sentence recognition accuracy. At the same time, continuous SLR usually requires strict sample labels, leading to the difficult task of manual labeling and limited training data availability. To address these challenges, this paper proposes a bidirectional spatial-temporal LSTM fusion attention network (Bi-ST-LSTM-A) for continuous SLR. This approach avoids problems such as sentence segmentation, word alignment, and tedious manual labeling. Our contributions are summarized as follows: (1) we proposed a sign language video feature representation method using a convolutional neural network (CNN) and spatial-temporal LSTM (ST-LSTM) information fusion technology; and (2) we constructed a uniform neural machine translation framework that can be used for complex continuous SLR and gesture recognition of nonspecific people in nonspecific environments. Experiments were carried out on some large continuous sign language datasets. The sign language recognition accuracy reached 81.22% on the 500 CSL dataset, 76.12% on the RWTH-PHOENIX-Weather dataset and 75.32% on the RWTH-PHOENIX-Weather-2014T dataset, thereby illustrating the effectiveness of the proposed framework.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2020.3039539