Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

Lip reading has gained popularity due to the proliferation of emerging real-world applications. This article provides a comprehensive review of benchmark datasets available for lip-reading applications and pioneering works that analyze lower facial cues for lip-reading applications. A comprehensive...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer vision and image understanding 2023-08, Vol.233, p.103738, Article 103738
Hauptverfasser: S.J., Preethi, B., Niranjana Krupa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Lip reading has gained popularity due to the proliferation of emerging real-world applications. This article provides a comprehensive review of benchmark datasets available for lip-reading applications and pioneering works that analyze lower facial cues for lip-reading applications. A comprehensive review of lip reading applications is broadly classified into five distinct applications: Lip Reading Biometrics (LRB), Audio Visual Speech Recognition (AVSR), Silent Speech Recognition (SSR), Voice from Lips, and Lip HCI (Human–computer interaction). LRB entails extensive research in the fields of authentication and liveness detection. AVSR covers key findings that have contributed significantly to applications such as voice assistants, video-to-text transcription, hearing aids, and pronunciation-correcting systems. SSR analyzes the efforts made for silent-video-to-text transcription and surveillance camera applications. The voice from lips section discusses applications such as voice for the voiceless and vision-infused speech inpainting. In lip HCI, LR-HCI for smartphones, smart TVs, computers, robots, and musical instruments is reviewed in detail. Comprehensive coverage is given to cutting-edge techniques in computer vision, signal processing, machine learning, and deep learning. The advancements that aid the system in learning to lip-read and authenticate lip gestures, generate text transcription, synthesize voice based on lip movements, and control systems via lip movements (lip HCI) are covered. The work concludes by highlighting the limitations of existing frameworks, the road maps of each application illustrating the evolution of techniques employed over time, and future research avenues in lip-reading applications. •Review of available benchmark datasets for lip-reading applications.•The growth of five types of lip reading application systems, namely, Lip Reading Biometrics (LRB), Audio Visual Speech Recognition (AVSR), Silent Speech Recognition (SSR), Voice from lips, and Lip HCI (Human–computer interaction) are reviewed.•Their progression in the techniques employed over time, covering all signal processing, machine vision, machine learning, and deep learning approaches, are highlighted.•The survey discusses challenges in several lip-reading application areas and provides a pathway to carry out further research.
ISSN:1077-3142
1090-235X
DOI:10.1016/j.cviu.2023.103738