Research on a Lip Reading Algorithm Based on Efficient-GhostNet

Lip reading technology refers to the analysis of the visual information of the speaker’s mouth movements to recognize the content of the speaker’s speech. As one of the important aspects of human–computer interaction, the technology of lip reading has gradually become popular with the development of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Electronics (Basel) 2023-03, Vol.12 (5), p.1151
Hauptverfasser: Zhang, Gaoyan, Lu, Yuanyao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Lip reading technology refers to the analysis of the visual information of the speaker’s mouth movements to recognize the content of the speaker’s speech. As one of the important aspects of human–computer interaction, the technology of lip reading has gradually become popular with the development of deep learning in recent years. At present, most the lip reading networks are very complex, with very large numbers of parameters and computation, and the model generated by training needs to occupy large memory, which brings difficulties for devices with limited storage capacity and computation power, such as mobile terminals. Based on the above problems, this paper optimizes and improves GhostNet, a lightweight network, and improves on it by proposing a more efficient Efficient-GhostNet, which achieves performance improvement while reducing the number of parameters through a local cross-channel interaction strategy, without dimensionality reduction. The improved Efficient-GhostNet is used to perform lip spatial feature extraction, and then the extracted features are inputted to the GRU network to obtain the temporal features of the lip sequences, and finally for prediction. We used Asian volunteers for the recording of the dataset in this paper, while also adopting data enhancement for the dataset, using the angle transformation of the dataset to deflect the recording process of the recorder by 15 degrees each to the left and right, in order to be able to enhance the robustness of the network and better reduce the influence of other factors, as well as to improve the generalization ability of the model so that the model can be more consistent with recognition scenarios in real life. Experiments prove that the improved Efficient-GhostNet + GRU model can achieve the purpose of reducing the number of parameters with comparable accuracy.
ISSN:2079-9292
2079-9292
DOI:10.3390/electronics12051151