Combinational sign language recognition

Traditional Sign Language Recognition (SLR) suffers from the scale limitation of SL datasets, which may lead to over-fitting in narrow context and application. In this paper, to solve the problem, we for the first time propose a Combinational Sign Language Recognition (CombSLR) framework, which can...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer vision and image understanding 2024-04, Vol.241, p.103972, Article 103972
Hauptverfasser:	Gao, Liqing, Feng, Wei, Lyu, Fan, Wan, Liang
Format:	Artikel
Sprache:	eng
Schlagworte:	Combinational learning Context passing Feature insertion Location prediction Sign language recognition (SLR)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Traditional Sign Language Recognition (SLR) suffers from the scale limitation of SL datasets, which may lead to over-fitting in narrow context and application. In this paper, to solve the problem, we for the first time propose a Combinational Sign Language Recognition (CombSLR) framework, which can serve as an augmentation to extend existing datasets by combining continuous videos (called Template) and isolated videos (called Entity). The CombSLR framework is trained on combinational SL data (T & E) and applied on continuous SL data. However, due to the unknown combination location and context inconsistency between any T-E pair, naively inserting E into T is infeasible. To tackle this issue, we propose a simple yet effective method named EinT, which contains two main modules: (1) Location Candidate Prediction, to produce a reliable insertion location considering the inter-frame relationship and make the network end-to-end trainable; (2) Feature Insertion via Context Passing, to eliminate context inconsistency between T and E feature. EinT can be easily compatible with the existing SLR models to effectively implement data augmentation at the feature level during training stage. We conduct extensive experiments on multiple publicly available sign language datasets, e.g., CCLS, CSL+DEVISIGN-D and CSL-Daily+DEVISIGN-D. The experimental results show the CombSLR can significantly promote existing SLR methods, e.g., averagely improving by 15.1% on CCLS dataset and 6.4% on CSL dataset for WER metric, which demonstrates the superiority of CombSLR framework. •We propose a novel and general Combinational Sign Language Recognition (CombSLR) framework, which for the first time serves as a data augmentation method to solve the problem of the limited scale of SL data.•In CombSLR framework, we propose an EinT method, which can reliably insert E into T at the feature level to achieve the effective combination of T and E.•The extensive experiments on three public datasets demonstrate the effectiveness of our proposed EinT method, which can be embedded into any SLR model to improve its performance.
ISSN:	1077-3142 1090-235X
DOI:	10.1016/j.cviu.2024.103972