Multi-Information Spatial-Temporal LSTM Fusion Continuous Sign Language Neural Machine Translation
There are two basic problems in sign language recognition (SLR): (a) isolated word SLR and (b) continuous SLR. Most of the existing continuous SLR methods are extensions of the isolated word SLR methods. These methods use the isolated word SLR results as the basic module and obtain the sentence reco...
Gespeichert in:
Veröffentlicht in: | IEEE access 2020, Vol.8, p.216718-216728 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | There are two basic problems in sign language recognition (SLR): (a) isolated word SLR and (b) continuous SLR. Most of the existing continuous SLR methods are extensions of the isolated word SLR methods. These methods use the isolated word SLR results as the basic module and obtain the sentence recognition results through sentence segmentation and word alignment. However, sentence segmentation and word alignment are often not accurate, resulting in a low sentence recognition accuracy. At the same time, continuous SLR usually requires strict sample labels, leading to the difficult task of manual labeling and limited training data availability. To address these challenges, this paper proposes a bidirectional spatial-temporal LSTM fusion attention network (Bi-ST-LSTM-A) for continuous SLR. This approach avoids problems such as sentence segmentation, word alignment, and tedious manual labeling. Our contributions are summarized as follows: (1) we proposed a sign language video feature representation method using a convolutional neural network (CNN) and spatial-temporal LSTM (ST-LSTM) information fusion technology; and (2) we constructed a uniform neural machine translation framework that can be used for complex continuous SLR and gesture recognition of nonspecific people in nonspecific environments. Experiments were carried out on some large continuous sign language datasets. The sign language recognition accuracy reached 81.22% on the 500 CSL dataset, 76.12% on the RWTH-PHOENIX-Weather dataset and 75.32% on the RWTH-PHOENIX-Weather-2014T dataset, thereby illustrating the effectiveness of the proposed framework. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2020.3039539 |