Making a robot recognize three simultaneous sentences in real-time

A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. We have adopted the missing feature theory (MFT) for automatic recognition of sep...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yamamoto, S., Nakadai, K., Valin, J.-M., Rouat, J., Michaud, F., Komatani, K., Ogata, T., Okuno, H.G.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. We have adopted the missing feature theory (MFT) for automatic recognition of separated speech, and developed the robot audition system. A microphone array is used along with a real-time dedicated implementation of geometric source separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. The automatic speech recognition based on MFT recognizes separated sounds by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. In this paper, we used the improved Julius as an MFT-based automatic speech recognizer (ASR). The Julius is a real-time large vocabulary continuous speech recognition (LVCSR) system. We performed the experiment to evaluate our robot audition system. In this experiment, the system recognizes a sentence, not an isolated word. We showed the improvement in the system performance through three simultaneous speech recognition on the humanoid SIG2.
ISSN:2153-0858
2153-0866
DOI:10.1109/IROS.2005.1545094