Multimodal Corpus Design for Audio-Visual Speech Recognition in Vehicle Cabin

This paper introduces a new methodology aimed at comfort for the driver in-the-wild multimodal corpus creation for audio-visual speech recognition in driver monitoring systems. The presented methodology is universal and can be used for corpus recording for different languages. We present an analysis...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2021, Vol.9, p.34986-35003
Hauptverfasser:	Kashevnik, Alexey, Lashkov, Igor, Axyonov, Alexandr, Ivanko, Denis, Ryumin, Dmitry, Kolchin, Artem, Karpov, Alexey
Format:	Artikel
Sprache:	eng
Schlagworte:	Applications programs Audio data automatic speech recognition Driver monitoring human–computer interaction Methodology Mobile computing Monitoring Monitoring systems multimodal corpus Questions Russian language Sensors Smart phones Speech Speech recognition Task analysis Vehicles Video data Vocabulary Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper introduces a new methodology aimed at comfort for the driver in-the-wild multimodal corpus creation for audio-visual speech recognition in driver monitoring systems. The presented methodology is universal and can be used for corpus recording for different languages. We present an analysis of speech recognition systems and voice interfaces for driver monitoring systems based on the analysis of both audio and video data. Multimodal speech recognition allows using audio data when video data are useless (e.g. at nighttime), as well as applying video data in acoustically noisy conditions (e.g., at highways). Our methodology identifies the main steps and requirements for multimodal corpus designing, including the development of a new framework for audio-visual corpus creation. We identify the main research questions related to the speech corpus creation task and discuss them in detail in this paper. We also consider some main cases of usage that require speech recognition in a vehicle cabin for interaction with a driver monitoring system. We also consider other important use cases when the system detects dangerous states of driver's drowsiness and starts a question-answer game to prevent dangerous situations. At the end based on the proposed methodology, we developed a mobile application that allows us to record a corpus for the Russian language. We created RUSAVIC corpus using the developed mobile application that at the moment a unique audiovisual corpus for the Russian language that is recorded in-the-wild condition.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3062752