Efficient sign language recognition system and dataset creation method based on deep learning and image processing
New deep-learning architectures are created every year, achieving state-of-the-art results in image recognition and leading to the belief that, in a few years, complex tasks such as sign language translation will be considerably easier, serving as a communication tool for the hearing-impaired commun...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | New deep-learning architectures are created every year, achieving
state-of-the-art results in image recognition and leading to the belief that,
in a few years, complex tasks such as sign language translation will be
considerably easier, serving as a communication tool for the hearing-impaired
community. On the other hand, these algorithms still need a lot of data to be
trained and the dataset creation process is expensive, time-consuming, and
slow. Thereby, this work aims to investigate techniques of digital image
processing and machine learning that can be used to create a sign language
dataset effectively. We argue about data acquisition, such as the frames per
second rate to capture or subsample the videos, the background type,
preprocessing, and data augmentation, using convolutional neural networks and
object detection to create an image classifier and comparing the results based
on statistical tests. Different datasets were created to test the hypotheses,
containing 14 words used daily and recorded by different smartphones in the RGB
color system. We achieved an accuracy of 96.38% on the test set and 81.36% on
the validation set containing more challenging conditions, showing that 30 FPS
is the best frame rate subsample to train the classifier, geometric
transformations work better than intensity transformations, and artificial
background creation is not effective to model generalization. These trade-offs
should be considered in future work as a cost-benefit guideline between
computational cost and accuracy gain when creating a dataset and training a
sign recognition model. |
---|---|
DOI: | 10.48550/arxiv.2103.12233 |