End-to-end visual speech recognition for human-robot interaction

In this paper we present a novel method designed for word-level visual speech recognition and intended for use in human-robot interaction. The ability of robots to understand natural human speech will significantly improve the quality of human-machine interaction. Despite outstanding breakthroughs a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ivanko, Denis, Ryumin, Dmitry, Markitantov, Maxim
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Artificial intelligence Human engineering Lip reading Robots Speech recognition Video data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper we present a novel method designed for word-level visual speech recognition and intended for use in human-robot interaction. The ability of robots to understand natural human speech will significantly improve the quality of human-machine interaction. Despite outstanding breakthroughs achieved in this field in recent years this challenge remains unresolved. In current research we mainly focus on the visual part of the human speech, so-called automated lip-reading task, which becomes crucial for human-robot interaction in acoustically noisy environment. The developed method is based on the use of state-of-the-art artificial intelligence technologies and allowed to achieve an incredible 85.03% speech recognition accuracy using only video data. It is worth noting that the model training and testing of the method was carried out on a benchmarking LRW database recorded in-the-wild, and the presented results surpass many existing achieved by the researchers of the world speech recognition community.
ISSN:	0094-243X 1551-7616
DOI:	10.1063/5.0197720