New Feature Extraction Approaches Based on Spatial Points for Visual-Only Lip-Reading

The act of speaking takes place as a result of the joint use of both the senses of vision and hearing. The visual senses of the event of speech play an important role in lip-reading, especially when the sound is distorted or inaccessible. Visual-only-based lip-reading is a more difficult problem tha...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Traitement du signal 2022-04, Vol.39 (2), p.659-668
Hauptverfasser:	Tung, Hamdullah, Tekin, Ramazan
Format:	Artikel
Sprache:	eng
Schlagworte:	Datasets Deep learning Digits Euclidean geometry Feature extraction Lip reading Lipreading Methods Neural networks Performance evaluation Reading Speech Voice recognition Wavelet transforms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The act of speaking takes place as a result of the joint use of both the senses of vision and hearing. The visual senses of the event of speech play an important role in lip-reading, especially when the sound is distorted or inaccessible. Visual-only-based lip-reading is a more difficult problem than audio-image-based lip-reading problems. In this study, three new spatial feature approaches to visual-only lip-reading are presented. To test the proposed feature extraction approaches, three datasets named AVLetters2 consisting of letters, AVDigits consisting of digits, and AVLetAVDig consisting of a combination of these two were used. First of all, the facial elements and lips were separated and the lip borders were marked with 20 points. Then, based on these spatial points, feature vectors were obtained with the feature approaches named Symmetric Euclidean Distance (SED), Central Euclidean Distance (CED), and Triple Points Angles (TPA). Extracted feature vectors were given to the CNN-LSTM network and 26 characters and 10 digits were tried to be estimated. As a result of the findings, the best success results for AVLetters2, AVDigits, and AVLetAVDig datasets were obtained by the SED+CNN+LSTM method as 53.2, 81.6, 59.8, respectively. When compared with the studies in the literature on the same data set, it was seen that very high and successful results were obtained.
ISSN:	0765-0019 1958-5608
DOI:	10.18280/ts.390229