Novel Spatio-Temporal Continuous Sign Language Recognition Using an Attentive Multi-Feature Network

Given video streams, we aim to correctly detect unsegmented signs related to continuous sign language recognition (CSLR). Despite the increase in proposed deep learning methods in this area, most of them mainly focus on using only an RGB feature, either the full-frame image or details of hands and f...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Sensors (Basel, Switzerland) Switzerland), 2022-08, Vol.22 (17), p.6452
Hauptverfasser:	Aditya, Wisnu, Shih, Timothy K., Thaipisutikul, Tipajin, Fitriajie, Arda Satata, Gochoo, Munkhjargal, Utaminingrum, Fitri, Lin, Chih-Yang
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Analysis Body image continuous sign language Datasets Frames (data processing) keypoints multi-feature Self image self-attention Sign language spatial Streaming media temporal Video data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Given video streams, we aim to correctly detect unsegmented signs related to continuous sign language recognition (CSLR). Despite the increase in proposed deep learning methods in this area, most of them mainly focus on using only an RGB feature, either the full-frame image or details of hands and face. The scarcity of information for the CSLR training process heavily constrains the capability to learn multiple features using the video input frames. Moreover, exploiting all frames in a video for the CSLR task could lead to suboptimal performance since each frame contains a different level of information, including main features in the inferencing of noise. Therefore, we propose novel spatio-temporal continuous sign language recognition using the attentive multi-feature network to enhance CSLR by providing extra keypoint features. In addition, we exploit the attention layer in the spatial and temporal modules to simultaneously emphasize multiple important features. Experimental results from both CSLR datasets demonstrate that the proposed method achieves superior performance in comparison with current state-of-the-art methods by 0.76 and 20.56 for the WER score on CSL and PHOENIX datasets, respectively.
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s22176452