A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition

In this paper, a comparative experimental assessment of computer vision-based methods for sign language recognition is conducted. By implementing the most recent deep neural network methods in this field, a thorough evaluation on multiple publicly available datasets is performed. The aim of the pres...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2022, Vol.24, p.1750-1762
Hauptverfasser:	Adaloglou, Nikolas, Chatzis, Theocharis, Papastratis, Ilias, Stergioulas, Andreas, Papadopoulos, Georgios Th, Zacharopoulou, Vassia, Xydopoulos, George J., Atzakas, Klimnis, Papazachariou, Dimitris, Daras, Petros
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Artificial neural networks Assistive technology Computer vision conditional entropy CTC Datasets Deep neural networks Feature extraction Gesture recognition Gloss Greek sign language Hidden Markov models Machine learning Sign language Sign Language Recognition Speech recognition stimulated CTC Task analysis Three-dimensional displays Training Video data
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, a comparative experimental assessment of computer vision-based methods for sign language recognition is conducted. By implementing the most recent deep neural network methods in this field, a thorough evaluation on multiple publicly available datasets is performed. The aim of the present study is to provide insights on sign language recognition, focusing on mapping non-segmented video streams to glosses. For this task, two new sequence training criteria, known from the fields of speech and scene text recognition, are introduced. Furthermore, a plethora of pretraining schemes is thoroughly discussed. Finally, a new RGB+D dataset for the Greek sign language is created. To the best of our knowledge, this is the first sign language dataset where three annotation levels are provided (individual gloss, sentence and spoken language) for the same set of video captures.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2021.3070438