Audio-Visual Speech Recognition Using Convolutive Bottleneck Networks for a Person with Severe Hearing Loss

In this paper, we propose an audio-visual speech recognition system for a person with an articulation disorder resulting from severe hearing loss. In the case of a person with this type of articulation disorder, the speech style is quite different from with the result that of people without hearing...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IPSJ Transactions on Computer Vision and Applications 2015, Vol.7, pp.64-68
Hauptverfasser:	Takashima, Yuki, Kakihara, Yasuhiro, Aihara, Ryo, Takiguchi, Tetsuya, Ariki, Yasuo, Mitani, Nobuyuki, Omori, Kiyohiro, Nakazono, Kaoru
Format:	Artikel
Sprache:	eng
Schlagworte:	assistive technology deep-learning lip reading multimodal
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we propose an audio-visual speech recognition system for a person with an articulation disorder resulting from severe hearing loss. In the case of a person with this type of articulation disorder, the speech style is quite different from with the result that of people without hearing loss that a speaker-independent model for unimpaired persons is hardly useful for recognizing it. We investigate in this paper an audio-visual speech recognition system for a person with severe hearing loss in noisy environments, where a robust feature extraction method using a convolutive bottleneck network (CBN) is applied to audio-visual data. We confirmed the effectiveness of this approach through word-recognition experiments in noisy environments, where the CBN-based feature extraction method outperformed the conventional methods.
ISSN:	1882-6695 1882-6695
DOI:	10.2197/ipsjtcva.7.64