Semi-blind speech extraction for robot using visual information and noise statistics

In this paper, speech recognition accuracy improvement is addressed for ICA-based multichannel noise reduction in spoken-dialogue robot. First, a new permutation solving method using a probability statistics model is proposed for realistic sound mixtures consisting of point-source speech and diffuse...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Saruwatari, Hiroshi, Hirata, Nobuhisa, Hatta, Toshiyuki, Wakisaka, Ryo, Shikano, Kiyohiro, Takatani, Tomoya
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, speech recognition accuracy improvement is addressed for ICA-based multichannel noise reduction in spoken-dialogue robot. First, a new permutation solving method using a probability statistics model is proposed for realistic sound mixtures consisting of point-source speech and diffuse noise. Next, to achieve high recognition accuracy for the early utterance of the target speaker, we introduce a new rapid ICA initialization method combining robot video information and a prestored initial separation filter bank. From this image information, an ICA initial filter fitted to the user's direction can be used to save the user's first utterance. The experimental results show that the proposed approaches can markedly improve the word recognition accuracy.
ISSN:2162-7843
DOI:10.1109/ISSPIT.2011.6151571