Making a robot recognize three simultaneous sentences in real-time

A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. We have adopted the missing feature theory (MFT) for automatic recognition of sep...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yamamoto, S., Nakadai, K., Valin, J.-M., Rouat, J., Michaud, F., Komatani, K., Ogata, T., Okuno, H.G.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Acoustic distortion automatic missing feature mask generation Automatic speech recognition continuous speech recognition Humanoid robots Interference Microphone arrays missing feature theory Real time systems robot audition Robotics and automation Source separation Speech recognition Vocabulary
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. We have adopted the missing feature theory (MFT) for automatic recognition of separated speech, and developed the robot audition system. A microphone array is used along with a real-time dedicated implementation of geometric source separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. The automatic speech recognition based on MFT recognizes separated sounds by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. In this paper, we used the improved Julius as an MFT-based automatic speech recognizer (ASR). The Julius is a real-time large vocabulary continuous speech recognition (LVCSR) system. We performed the experiment to evaluate our robot audition system. In this experiment, the system recognizes a sentence, not an isolated word. We showed the improvement in the system performance through three simultaneous speech recognition on the humanoid SIG2.
ISSN:	2153-0858 2153-0866
DOI:	10.1109/IROS.2005.1545094