Isolated Video-Based Arabic Sign Language Recognition Using Convolutional and Recursive Neural Networks

For natural and meaningful communication between the deaf community and the hearing population, sign language is very important. Most of the Arab sign recognition studies have focused on the identification of the sign action based on the descriptor of the feature. However, the limitation of this tra...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Arabian journal for science and engineering (2011) 2022-02, Vol.47 (2), p.2187-2199
Hauptverfasser:	Boukdir, Abdelbasset, Benaddy, Mohamed, Ellahyani, Ayoub, Meslouhi, Othmane El, Kardouchi, Mustapha
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Deep learning Engineering Feature extraction Humanities and Social Sciences Machine learning multidisciplinary Neural networks Recognition Research Article-Computer Engineering and Computer Science Science Sign language Video data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	For natural and meaningful communication between the deaf community and the hearing population, sign language is very important. Most of the Arab sign recognition studies have focused on the identification of the sign action based on the descriptor of the feature. However, the limitation of this traditional method is the need to choose which features are important in each particular sequence. To address this issue, we propose a novel approach based on a deep learning architecture to classify video sequences of Arabic sign language, especially Moroccan sign language. Two methods of classification are applied, namely 2D convolutional recurring neural network (2DCRNN) and 3D convolutional neural network (3DCNN). Concerning the first method, a 2DCRNN model is used to extract features with a recurring network pattern to detect the relationship between frames. The second method uses a 3DCNN model learning the spatiotemporal features out of small patches. After 2DCRNN and the 3DCNN models extracted feature, the video data are classified into various classes, using a fully connected network. The proposed approach is trained over a collection of 224 videos of five individuals performing 56 different signs. The results achieved through the fourfold cross-validation technique demonstrate the performance of the proposed approach in terms of recall, F1 score, and AUROC, with the level accuracy of 92% for 2DCRNN and 99% for 3DCNN.
ISSN:	2193-567X 1319-8025 2191-4281
DOI:	10.1007/s13369-021-06167-5