Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR

We present a novel method of integrating the likelihoods of multiple feature streams, representing different acoustic aspects, for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a higher weight is given to a stream that is robust to a v...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEICE Transactions on Information and Systems 2008/03/01, Vol.E91.D(3), pp.815-824
Hauptverfasser: SATO, Shoei, KOBAYASHI, Akio, ONOE, Kazuo, HOMMA, Shinichi, IMAI, Toru, TAKAGI, Tohru, KOBAYASHI, Tetsunori
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 824
container_issue 3
container_start_page 815
container_title IEICE Transactions on Information and Systems
container_volume E91.D
creator SATO, Shoei
KOBAYASHI, Akio
ONOE, Kazuo
HOMMA, Shinichi
IMAI, Toru
TAKAGI, Tohru
KOBAYASHI, Tetsunori
description We present a novel method of integrating the likelihoods of multiple feature streams, representing different acoustic aspects, for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a higher weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to show discriminative ability. A conventional method proposed for the recognition of spoken digits calculates the weights front the entropy of the whole set of HMM states. This paper extends the dynamic weighting to a real-time large-vocabulary continuous speech recognition (LVCSR) system. The proposed weight is calculated in real-time from mutual information between an input stream and active HMM states in a searchs pace without an additional likelihood calculation. Furthermore, the mutual information takes the width of the search space into account by calculating the marginal entropy from the number of active states. In this paper, we integrate three features that are extracted through auditory filters by taking into account the human auditory system's ability to extract amplitude and frequency modulations. Due to this, features representing energy, amplitude drift, and resonant frequency drifts, are integrated. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments on field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9.2% in field reports and 4.7% in spontaneous commentaries relative to the best result obtained from a single stream.
doi_str_mv 10.1093/ietisy/e91-d.3.815
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671226222</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1671226222</sourcerecordid><originalsourceid>FETCH-LOGICAL-c539t-c815b853d00c65d8d6a2c119acc23a45c5dfe2162f7a8f16bb8115a72e39c9343</originalsourceid><addsrcrecordid>eNo9kM1u2zAQhImiBeqmfYGeeAnQixwuacrSsXV-CwcFnLRXYk0tEwaU5JLUwW9fBkp92sN8MzsYxr6CWIJo1YWn7NPxglqouqVaNqDfsQWsV7oCVcN7thAt1FWjlfzIPqX0IgQ0EvSC2fspTxj43eDG2GP248B_YKKOXx4H7L0tSqanOCuj4_dTyP4QiF8T5ikSf8iRsE-8-Plu3E8p8x1hqB59T3z7Z_Ow-8w-OAyJvrzdM_b7-upxc1ttf93cbb5vK6tVmytbWu9LxU4IW-uu6WqUFqBFa6XClba6cyShlm6NjYN6v28ANK4lqda2aqXO2Lc59xDHvxOlbHqfLIWAA41TMlCvQcpaSllQOaM2jilFcuYQfY_xaECY10XNvKgpi5rOKFO6FdP5Wz4mi8FFHKxPJ6cUElatFoX7OXMvKeMTnQCM2dtAJhdj8oMzVyX8soT_v-XJCbLPGA0N6h-soJQP</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1671226222</pqid></control><display><type>article</type><title>Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR</title><source>J-STAGE (Japan Science &amp; Technology Information Aggregator, Electronic) Freely Available Titles - Japanese</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>SATO, Shoei ; KOBAYASHI, Akio ; ONOE, Kazuo ; HOMMA, Shinichi ; IMAI, Toru ; TAKAGI, Tohru ; KOBAYASHI, Tetsunori</creator><creatorcontrib>SATO, Shoei ; KOBAYASHI, Akio ; ONOE, Kazuo ; HOMMA, Shinichi ; IMAI, Toru ; TAKAGI, Tohru ; KOBAYASHI, Tetsunori</creatorcontrib><description>We present a novel method of integrating the likelihoods of multiple feature streams, representing different acoustic aspects, for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a higher weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to show discriminative ability. A conventional method proposed for the recognition of spoken digits calculates the weights front the entropy of the whole set of HMM states. This paper extends the dynamic weighting to a real-time large-vocabulary continuous speech recognition (LVCSR) system. The proposed weight is calculated in real-time from mutual information between an input stream and active HMM states in a searchs pace without an additional likelihood calculation. Furthermore, the mutual information takes the width of the search space into account by calculating the marginal entropy from the number of active states. In this paper, we integrate three features that are extracted through auditory filters by taking into account the human auditory system's ability to extract amplitude and frequency modulations. Due to this, features representing energy, amplitude drift, and resonant frequency drifts, are integrated. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments on field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9.2% in field reports and 4.7% in spontaneous commentaries relative to the best result obtained from a single stream.</description><identifier>ISSN: 0916-8532</identifier><identifier>ISSN: 1745-1361</identifier><identifier>EISSN: 1745-1361</identifier><identifier>DOI: 10.1093/ietisy/e91-d.3.815</identifier><language>eng</language><publisher>Oxford: The Institute of Electronics, Information and Communication Engineers</publisher><subject>active hypotheses ; Applied sciences ; Artificial intelligence ; Computer science; control theory; systems ; Dynamical systems ; Dynamics ; Entropy ; Exact sciences and technology ; Information, signal and communications theory ; Mathematical analysis ; Modulation, demodulation ; mutual information ; Real time ; Searching ; Signal and communications theory ; Signal processing ; Speech and sound recognition and synthesis. Linguistics ; Speech processing ; Speech recognition ; stream integration ; Streams ; Systems, networks and services of telecommunications ; Telecommunications ; Telecommunications and information theory ; Transmission and modulation (techniques and equipments)</subject><ispartof>IEICE Transactions on Information and Systems, 2008/03/01, Vol.E91.D(3), pp.815-824</ispartof><rights>2008 The Institute of Electronics, Information and Communication Engineers</rights><rights>2008 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1883,4024,27923,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=20214950$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>SATO, Shoei</creatorcontrib><creatorcontrib>KOBAYASHI, Akio</creatorcontrib><creatorcontrib>ONOE, Kazuo</creatorcontrib><creatorcontrib>HOMMA, Shinichi</creatorcontrib><creatorcontrib>IMAI, Toru</creatorcontrib><creatorcontrib>TAKAGI, Tohru</creatorcontrib><creatorcontrib>KOBAYASHI, Tetsunori</creatorcontrib><title>Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR</title><title>IEICE Transactions on Information and Systems</title><addtitle>IEICE Trans. Inf. &amp; Syst.</addtitle><description>We present a novel method of integrating the likelihoods of multiple feature streams, representing different acoustic aspects, for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a higher weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to show discriminative ability. A conventional method proposed for the recognition of spoken digits calculates the weights front the entropy of the whole set of HMM states. This paper extends the dynamic weighting to a real-time large-vocabulary continuous speech recognition (LVCSR) system. The proposed weight is calculated in real-time from mutual information between an input stream and active HMM states in a searchs pace without an additional likelihood calculation. Furthermore, the mutual information takes the width of the search space into account by calculating the marginal entropy from the number of active states. In this paper, we integrate three features that are extracted through auditory filters by taking into account the human auditory system's ability to extract amplitude and frequency modulations. Due to this, features representing energy, amplitude drift, and resonant frequency drifts, are integrated. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments on field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9.2% in field reports and 4.7% in spontaneous commentaries relative to the best result obtained from a single stream.</description><subject>active hypotheses</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Computer science; control theory; systems</subject><subject>Dynamical systems</subject><subject>Dynamics</subject><subject>Entropy</subject><subject>Exact sciences and technology</subject><subject>Information, signal and communications theory</subject><subject>Mathematical analysis</subject><subject>Modulation, demodulation</subject><subject>mutual information</subject><subject>Real time</subject><subject>Searching</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>stream integration</subject><subject>Streams</subject><subject>Systems, networks and services of telecommunications</subject><subject>Telecommunications</subject><subject>Telecommunications and information theory</subject><subject>Transmission and modulation (techniques and equipments)</subject><issn>0916-8532</issn><issn>1745-1361</issn><issn>1745-1361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><recordid>eNo9kM1u2zAQhImiBeqmfYGeeAnQixwuacrSsXV-CwcFnLRXYk0tEwaU5JLUwW9fBkp92sN8MzsYxr6CWIJo1YWn7NPxglqouqVaNqDfsQWsV7oCVcN7thAt1FWjlfzIPqX0IgQ0EvSC2fspTxj43eDG2GP248B_YKKOXx4H7L0tSqanOCuj4_dTyP4QiF8T5ikSf8iRsE-8-Plu3E8p8x1hqB59T3z7Z_Ow-8w-OAyJvrzdM_b7-upxc1ttf93cbb5vK6tVmytbWu9LxU4IW-uu6WqUFqBFa6XClba6cyShlm6NjYN6v28ANK4lqda2aqXO2Lc59xDHvxOlbHqfLIWAA41TMlCvQcpaSllQOaM2jilFcuYQfY_xaECY10XNvKgpi5rOKFO6FdP5Wz4mi8FFHKxPJ6cUElatFoX7OXMvKeMTnQCM2dtAJhdj8oMzVyX8soT_v-XJCbLPGA0N6h-soJQP</recordid><startdate>2008</startdate><enddate>2008</enddate><creator>SATO, Shoei</creator><creator>KOBAYASHI, Akio</creator><creator>ONOE, Kazuo</creator><creator>HOMMA, Shinichi</creator><creator>IMAI, Toru</creator><creator>TAKAGI, Tohru</creator><creator>KOBAYASHI, Tetsunori</creator><general>The Institute of Electronics, Information and Communication Engineers</general><general>Oxford University Press</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2008</creationdate><title>Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR</title><author>SATO, Shoei ; KOBAYASHI, Akio ; ONOE, Kazuo ; HOMMA, Shinichi ; IMAI, Toru ; TAKAGI, Tohru ; KOBAYASHI, Tetsunori</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c539t-c815b853d00c65d8d6a2c119acc23a45c5dfe2162f7a8f16bb8115a72e39c9343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>active hypotheses</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Computer science; control theory; systems</topic><topic>Dynamical systems</topic><topic>Dynamics</topic><topic>Entropy</topic><topic>Exact sciences and technology</topic><topic>Information, signal and communications theory</topic><topic>Mathematical analysis</topic><topic>Modulation, demodulation</topic><topic>mutual information</topic><topic>Real time</topic><topic>Searching</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>stream integration</topic><topic>Streams</topic><topic>Systems, networks and services of telecommunications</topic><topic>Telecommunications</topic><topic>Telecommunications and information theory</topic><topic>Transmission and modulation (techniques and equipments)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>SATO, Shoei</creatorcontrib><creatorcontrib>KOBAYASHI, Akio</creatorcontrib><creatorcontrib>ONOE, Kazuo</creatorcontrib><creatorcontrib>HOMMA, Shinichi</creatorcontrib><creatorcontrib>IMAI, Toru</creatorcontrib><creatorcontrib>TAKAGI, Tohru</creatorcontrib><creatorcontrib>KOBAYASHI, Tetsunori</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEICE Transactions on Information and Systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>SATO, Shoei</au><au>KOBAYASHI, Akio</au><au>ONOE, Kazuo</au><au>HOMMA, Shinichi</au><au>IMAI, Toru</au><au>TAKAGI, Tohru</au><au>KOBAYASHI, Tetsunori</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR</atitle><jtitle>IEICE Transactions on Information and Systems</jtitle><addtitle>IEICE Trans. Inf. &amp; Syst.</addtitle><date>2008</date><risdate>2008</risdate><volume>E91.D</volume><issue>3</issue><spage>815</spage><epage>824</epage><pages>815-824</pages><issn>0916-8532</issn><issn>1745-1361</issn><eissn>1745-1361</eissn><abstract>We present a novel method of integrating the likelihoods of multiple feature streams, representing different acoustic aspects, for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a higher weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to show discriminative ability. A conventional method proposed for the recognition of spoken digits calculates the weights front the entropy of the whole set of HMM states. This paper extends the dynamic weighting to a real-time large-vocabulary continuous speech recognition (LVCSR) system. The proposed weight is calculated in real-time from mutual information between an input stream and active HMM states in a searchs pace without an additional likelihood calculation. Furthermore, the mutual information takes the width of the search space into account by calculating the marginal entropy from the number of active states. In this paper, we integrate three features that are extracted through auditory filters by taking into account the human auditory system's ability to extract amplitude and frequency modulations. Due to this, features representing energy, amplitude drift, and resonant frequency drifts, are integrated. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments on field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9.2% in field reports and 4.7% in spontaneous commentaries relative to the best result obtained from a single stream.</abstract><cop>Oxford</cop><pub>The Institute of Electronics, Information and Communication Engineers</pub><doi>10.1093/ietisy/e91-d.3.815</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0916-8532
ispartof IEICE Transactions on Information and Systems, 2008/03/01, Vol.E91.D(3), pp.815-824
issn 0916-8532
1745-1361
1745-1361
language eng
recordid cdi_proquest_miscellaneous_1671226222
source J-STAGE (Japan Science & Technology Information Aggregator, Electronic) Freely Available Titles - Japanese; EZB-FREE-00999 freely available EZB journals
subjects active hypotheses
Applied sciences
Artificial intelligence
Computer science
control theory
systems
Dynamical systems
Dynamics
Entropy
Exact sciences and technology
Information, signal and communications theory
Mathematical analysis
Modulation, demodulation
mutual information
Real time
Searching
Signal and communications theory
Signal processing
Speech and sound recognition and synthesis. Linguistics
Speech processing
Speech recognition
stream integration
Streams
Systems, networks and services of telecommunications
Telecommunications
Telecommunications and information theory
Transmission and modulation (techniques and equipments)
title Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T05%3A34%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mutual%20Information%20Based%20Dynamic%20Integration%20of%20Multiple%20Feature%20Streams%20for%20Robust%20Real-Time%20LVCSR&rft.jtitle=IEICE%20Transactions%20on%20Information%20and%20Systems&rft.au=SATO,%20Shoei&rft.date=2008&rft.volume=E91.D&rft.issue=3&rft.spage=815&rft.epage=824&rft.pages=815-824&rft.issn=0916-8532&rft.eissn=1745-1361&rft_id=info:doi/10.1093/ietisy/e91-d.3.815&rft_dat=%3Cproquest_cross%3E1671226222%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1671226222&rft_id=info:pmid/&rfr_iscdi=true