Real-Time Robot Audition System That Recognizes Simultaneous Speech in The Real World

This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported missing feature theory (MFT) based integration of sound source separation (SSS) and automatic speech recognition (ASR) for building robus...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yamamoto, S., Nakadai, K., Nakano, M., Tsujino, H., Valin, J.-M., Komatani, K., Ogata, T., Okuno, H.G.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Automatic speech recognition genetic algorithm Loudspeakers Microphones missing feature theory parameter optimization Prototypes Real time systems real-time processing robot audition Robotics and automation Robots Robustness Source separation Speech recognition voice activity detection
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	5338
container_issue
container_start_page	5333
container_title
container_volume
creator	Yamamoto, S. Nakadai, K. Nakano, M. Tsujino, H. Valin, J.-M. Komatani, K. Ogata, T. Okuno, H.G.
description	This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported missing feature theory (MFT) based integration of sound source separation (SSS) and automatic speech recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in voice activity detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed genetic algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment
doi_str_mv	10.1109/IROS.2006.282037
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4059274</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4059274</ieee_id><sourcerecordid>4059274</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-28bad90a1cbf41ef4d255053a56321ce8e343830c6dfd98b34f9cf17e33294733</originalsourceid><addsrcrecordid>eNpVjMtqwzAURNUXNKTeF7rRD9iVdCVbWobQpoFAwHboMsj2daPiR7DlRfr1NbQUOpthOMMh5JGziHNmnrfpPosEY3EktGCQXJHAJJpLISUTyqhrshBcQch0HN_8Y1rf_jGl70kwjp9sDhgluV6QQ4q2CXPXIk37ovd0NVXOu76j2WX02NL8ZD1Nsew_OveFI81cOzXedthP8zgjlifquvk2C2YVfe-Hpnogd7VtRgx-e0kOry_5-i3c7Tfb9WoXOp4oHwpd2Mowy8uilhxrWQmlmAKrYhC8RI0gQQMr46qujC5A1qaseYIAwsgEYEmefrwOEY_nwbV2uBwlU0YkEr4B5AxVWQ</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Real-Time Robot Audition System That Recognizes Simultaneous Speech in The Real World</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Yamamoto, S. ; Nakadai, K. ; Nakano, M. ; Tsujino, H. ; Valin, J.-M. ; Komatani, K. ; Ogata, T. ; Okuno, H.G.</creator><creatorcontrib>Yamamoto, S. ; Nakadai, K. ; Nakano, M. ; Tsujino, H. ; Valin, J.-M. ; Komatani, K. ; Ogata, T. ; Okuno, H.G.</creatorcontrib><description>This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported missing feature theory (MFT) based integration of sound source separation (SSS) and automatic speech recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in voice activity detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed genetic algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment</description><identifier>ISSN: 2153-0858</identifier><identifier>ISBN: 9781424402588</identifier><identifier>ISBN: 1424402581</identifier><identifier>EISSN: 2153-0866</identifier><identifier>EISBN: 9781424402595</identifier><identifier>EISBN: 142440259X</identifier><identifier>DOI: 10.1109/IROS.2006.282037</identifier><language>eng</language><publisher>IEEE</publisher><subject>Automatic speech recognition ; genetic algorithm ; Loudspeakers ; Microphones ; missing feature theory ; parameter optimization ; Prototypes ; Real time systems ; real-time processing ; robot audition ; Robotics and automation ; Robots ; Robustness ; Source separation ; Speech recognition ; voice activity detection</subject><ispartof>2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006, p.5333-5338</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4059274$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,2052,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4059274$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yamamoto, S.</creatorcontrib><creatorcontrib>Nakadai, K.</creatorcontrib><creatorcontrib>Nakano, M.</creatorcontrib><creatorcontrib>Tsujino, H.</creatorcontrib><creatorcontrib>Valin, J.-M.</creatorcontrib><creatorcontrib>Komatani, K.</creatorcontrib><creatorcontrib>Ogata, T.</creatorcontrib><creatorcontrib>Okuno, H.G.</creatorcontrib><title>Real-Time Robot Audition System That Recognizes Simultaneous Speech in The Real World</title><title>2006 IEEE/RSJ International Conference on Intelligent Robots and Systems</title><addtitle>IROS</addtitle><description>This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported missing feature theory (MFT) based integration of sound source separation (SSS) and automatic speech recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in voice activity detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed genetic algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment</description><subject>Automatic speech recognition</subject><subject>genetic algorithm</subject><subject>Loudspeakers</subject><subject>Microphones</subject><subject>missing feature theory</subject><subject>parameter optimization</subject><subject>Prototypes</subject><subject>Real time systems</subject><subject>real-time processing</subject><subject>robot audition</subject><subject>Robotics and automation</subject><subject>Robots</subject><subject>Robustness</subject><subject>Source separation</subject><subject>Speech recognition</subject><subject>voice activity detection</subject><issn>2153-0858</issn><issn>2153-0866</issn><isbn>9781424402588</isbn><isbn>1424402581</isbn><isbn>9781424402595</isbn><isbn>142440259X</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2006</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVjMtqwzAURNUXNKTeF7rRD9iVdCVbWobQpoFAwHboMsj2daPiR7DlRfr1NbQUOpthOMMh5JGziHNmnrfpPosEY3EktGCQXJHAJJpLISUTyqhrshBcQch0HN_8Y1rf_jGl70kwjp9sDhgluV6QQ4q2CXPXIk37ovd0NVXOu76j2WX02NL8ZD1Nsew_OveFI81cOzXedthP8zgjlifquvk2C2YVfe-Hpnogd7VtRgx-e0kOry_5-i3c7Tfb9WoXOp4oHwpd2Mowy8uilhxrWQmlmAKrYhC8RI0gQQMr46qujC5A1qaseYIAwsgEYEmefrwOEY_nwbV2uBwlU0YkEr4B5AxVWQ</recordid><startdate>200610</startdate><enddate>200610</enddate><creator>Yamamoto, S.</creator><creator>Nakadai, K.</creator><creator>Nakano, M.</creator><creator>Tsujino, H.</creator><creator>Valin, J.-M.</creator><creator>Komatani, K.</creator><creator>Ogata, T.</creator><creator>Okuno, H.G.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>200610</creationdate><title>Real-Time Robot Audition System That Recognizes Simultaneous Speech in The Real World</title><author>Yamamoto, S. ; Nakadai, K. ; Nakano, M. ; Tsujino, H. ; Valin, J.-M. ; Komatani, K. ; Ogata, T. ; Okuno, H.G.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-28bad90a1cbf41ef4d255053a56321ce8e343830c6dfd98b34f9cf17e33294733</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Automatic speech recognition</topic><topic>genetic algorithm</topic><topic>Loudspeakers</topic><topic>Microphones</topic><topic>missing feature theory</topic><topic>parameter optimization</topic><topic>Prototypes</topic><topic>Real time systems</topic><topic>real-time processing</topic><topic>robot audition</topic><topic>Robotics and automation</topic><topic>Robots</topic><topic>Robustness</topic><topic>Source separation</topic><topic>Speech recognition</topic><topic>voice activity detection</topic><toplevel>online_resources</toplevel><creatorcontrib>Yamamoto, S.</creatorcontrib><creatorcontrib>Nakadai, K.</creatorcontrib><creatorcontrib>Nakano, M.</creatorcontrib><creatorcontrib>Tsujino, H.</creatorcontrib><creatorcontrib>Valin, J.-M.</creatorcontrib><creatorcontrib>Komatani, K.</creatorcontrib><creatorcontrib>Ogata, T.</creatorcontrib><creatorcontrib>Okuno, H.G.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yamamoto, S.</au><au>Nakadai, K.</au><au>Nakano, M.</au><au>Tsujino, H.</au><au>Valin, J.-M.</au><au>Komatani, K.</au><au>Ogata, T.</au><au>Okuno, H.G.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Real-Time Robot Audition System That Recognizes Simultaneous Speech in The Real World</atitle><btitle>2006 IEEE/RSJ International Conference on Intelligent Robots and Systems</btitle><stitle>IROS</stitle><date>2006-10</date><risdate>2006</risdate><spage>5333</spage><epage>5338</epage><pages>5333-5338</pages><issn>2153-0858</issn><eissn>2153-0866</eissn><isbn>9781424402588</isbn><isbn>1424402581</isbn><eisbn>9781424402595</eisbn><eisbn>142440259X</eisbn><abstract>This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported missing feature theory (MFT) based integration of sound source separation (SSS) and automatic speech recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in voice activity detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed genetic algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment</abstract><pub>IEEE</pub><doi>10.1109/IROS.2006.282037</doi><tpages>6</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2153-0858
ispartof	2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006, p.5333-5338
issn	2153-0858 2153-0866
language	eng
recordid	cdi_ieee_primary_4059274
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Automatic speech recognition genetic algorithm Loudspeakers Microphones missing feature theory parameter optimization Prototypes Real time systems real-time processing robot audition Robotics and automation Robots Robustness Source separation Speech recognition voice activity detection
title	Real-Time Robot Audition System That Recognizes Simultaneous Speech in The Real World
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T05%3A06%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Real-Time%20Robot%20Audition%20System%20That%20Recognizes%20Simultaneous%20Speech%20in%20The%20Real%20World&rft.btitle=2006%20IEEE/RSJ%20International%20Conference%20on%20Intelligent%20Robots%20and%20Systems&rft.au=Yamamoto,%20S.&rft.date=2006-10&rft.spage=5333&rft.epage=5338&rft.pages=5333-5338&rft.issn=2153-0858&rft.eissn=2153-0866&rft.isbn=9781424402588&rft.isbn_list=1424402581&rft_id=info:doi/10.1109/IROS.2006.282037&rft_dat=%3Cieee_6IE%3E4059274%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424402595&rft.eisbn_list=142440259X&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4059274&rfr_iscdi=true