A hybrid visual feature extraction method for audio-visual speech recognition

In this paper, a hybrid visual feature extraction method that combines the extended locally linear embedding (LLE) with visemic linear discriminant analysis (LDA) was presented for the audio-visual speech recognition (AVSR). Firstly the extended LLE is presented to reduce the dimension of the mouth...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Guanyong Wu, Jie Zhu, Haihua Xu
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	audiovisual speech recognition (AVSR) Feature extraction Image segmentation Linear discriminant analysis locally linear embedding (LLE) minimum classification error (MCE) Mouth Principal component analysis Spatial databases Speech recognition Streaming media Vectors Visual databases
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, a hybrid visual feature extraction method that combines the extended locally linear embedding (LLE) with visemic linear discriminant analysis (LDA) was presented for the audio-visual speech recognition (AVSR). Firstly the extended LLE is presented to reduce the dimension of the mouth images, which constrains the scope of finding mouth data neighborhood to the corresponding individual's dataset instead of the whole dataset, and then maps the high dimensional mouth image matrices into a low-dimensional Euclidean space. Secondly we project the feature vectors on the visemic linear discriminant space to find the optimal classification. Finally, in the audio-visual fusion period, the minimum classification error (MCE) training based on the segmental generalized probabilistic descent (GPD) is applied to audio and visual stream weights optimization. Experimental results conducted the CUAVE database show that the proposed method achieves a significant performance than that of the classical PCA and LDA based method in visual-only speech recognition. Further experimental results show the robustness of the MCE based discriminative training method in noisy environment.
ISSN:	1522-4880 2381-8549
DOI:	10.1109/ICIP.2009.5413573