A phonetic feature based lattice rescoring approach to LVCSR

Large Vocabulary Continuous Speech Recognition (LVCSR) systems decode the input speech using diverse information sources, such as acoustic, lexical, and linguistic. Although most of the unreliable hypotheses are pruned during the recognition process, current state-of-the-art systems often make error...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Siniscalchi, S.M., Svendsen, T., Chin-Hui Lee
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Acoustic signal detection Artificial neural networks Automatic speech recognition Decoding Detectors Hidden Markov models Humans Lattices neural networks Speech recognition Vocabulary
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3868
container_issue
container_start_page	3865
container_title
container_volume
creator	Siniscalchi, S.M. Svendsen, T. Chin-Hui Lee
description	Large Vocabulary Continuous Speech Recognition (LVCSR) systems decode the input speech using diverse information sources, such as acoustic, lexical, and linguistic. Although most of the unreliable hypotheses are pruned during the recognition process, current state-of-the-art systems often make errors that are ldquounreasonablerdquo for human listeners. Several studies have shown that a proper integration of acoustic-phonetic information can be beneficial to reducing such errors. We have previously shown that high-accuracy phone recognition can be achieved if a bank of speech attribute detectors is used to compute a confidence score describing attribute activation levels that the current frame exhibits. In those experiments, the phone recognition system did not rely on the language model to follow their word sequence constraints, and the vocabulary was small. In this work, we extend our approach to LVCSR by introducing a second recognition step during which additional information not directly used during conventional log-likelihood based decoding is introduced. Experimental results show promising performance.
doi_str_mv	10.1109/ICASSP.2009.4960471
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4960471</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4960471</ieee_id><sourcerecordid>4960471</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-99386f30725f00489e7e02bf2e6b05cae8d7673c8832236472ef673fe759f6533</originalsourceid><addsrcrecordid>eNpVUE1LxDAUjF9gWfsL9pI_kJrk5RO8LEVXoaBYFW9L2n1xK-u2pPXgv7fgXpzLMDPw3jCELAUvhOD--qFc1fVTITn3hfKGKytOSO6tE0oqJUErfUoyCdYz4fn72b8M3DnJhJacGaH8JcnH8ZPPUBqE0hm5WdFh1x9w6loaMUzfCWkTRtzSfZhmE2nCse1Td_igYRhSH9odnXpavZX18xW5iGE_Yn7kBXm9u30p71n1uJ5bV6wTVk_Me3AmArdSx_mz82iRyyZKNA3XbUC3tcZC6xxICUZZiXHWEa320WiABVn-3e0QcTOk7iukn81xC_gFe0BMVA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A phonetic feature based lattice rescoring approach to LVCSR</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Siniscalchi, S.M. ; Svendsen, T. ; Chin-Hui Lee</creator><creatorcontrib>Siniscalchi, S.M. ; Svendsen, T. ; Chin-Hui Lee</creatorcontrib><description>Large Vocabulary Continuous Speech Recognition (LVCSR) systems decode the input speech using diverse information sources, such as acoustic, lexical, and linguistic. Although most of the unreliable hypotheses are pruned during the recognition process, current state-of-the-art systems often make errors that are ldquounreasonablerdquo for human listeners. Several studies have shown that a proper integration of acoustic-phonetic information can be beneficial to reducing such errors. We have previously shown that high-accuracy phone recognition can be achieved if a bank of speech attribute detectors is used to compute a confidence score describing attribute activation levels that the current frame exhibits. In those experiments, the phone recognition system did not rely on the language model to follow their word sequence constraints, and the vocabulary was small. In this work, we extend our approach to LVCSR by introducing a second recognition step during which additional information not directly used during conventional log-likelihood based decoding is introduced. Experimental results show promising performance.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424423538</identifier><identifier>ISBN: 1424423538</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424423545</identifier><identifier>EISBN: 1424423546</identifier><identifier>DOI: 10.1109/ICASSP.2009.4960471</identifier><language>eng</language><publisher>IEEE</publisher><subject>Acoustic signal detection ; Artificial neural networks ; Automatic speech recognition ; Decoding ; Detectors ; Hidden Markov models ; Humans ; Lattices ; neural networks ; Speech recognition ; Vocabulary</subject><ispartof>2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.3865-3868</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4960471$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4960471$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Siniscalchi, S.M.</creatorcontrib><creatorcontrib>Svendsen, T.</creatorcontrib><creatorcontrib>Chin-Hui Lee</creatorcontrib><title>A phonetic feature based lattice rescoring approach to LVCSR</title><title>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>Large Vocabulary Continuous Speech Recognition (LVCSR) systems decode the input speech using diverse information sources, such as acoustic, lexical, and linguistic. Although most of the unreliable hypotheses are pruned during the recognition process, current state-of-the-art systems often make errors that are ldquounreasonablerdquo for human listeners. Several studies have shown that a proper integration of acoustic-phonetic information can be beneficial to reducing such errors. We have previously shown that high-accuracy phone recognition can be achieved if a bank of speech attribute detectors is used to compute a confidence score describing attribute activation levels that the current frame exhibits. In those experiments, the phone recognition system did not rely on the language model to follow their word sequence constraints, and the vocabulary was small. In this work, we extend our approach to LVCSR by introducing a second recognition step during which additional information not directly used during conventional log-likelihood based decoding is introduced. Experimental results show promising performance.</description><subject>Acoustic signal detection</subject><subject>Artificial neural networks</subject><subject>Automatic speech recognition</subject><subject>Decoding</subject><subject>Detectors</subject><subject>Hidden Markov models</subject><subject>Humans</subject><subject>Lattices</subject><subject>neural networks</subject><subject>Speech recognition</subject><subject>Vocabulary</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424423538</isbn><isbn>1424423538</isbn><isbn>9781424423545</isbn><isbn>1424423546</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVUE1LxDAUjF9gWfsL9pI_kJrk5RO8LEVXoaBYFW9L2n1xK-u2pPXgv7fgXpzLMDPw3jCELAUvhOD--qFc1fVTITn3hfKGKytOSO6tE0oqJUErfUoyCdYz4fn72b8M3DnJhJacGaH8JcnH8ZPPUBqE0hm5WdFh1x9w6loaMUzfCWkTRtzSfZhmE2nCse1Td_igYRhSH9odnXpavZX18xW5iGE_Yn7kBXm9u30p71n1uJ5bV6wTVk_Me3AmArdSx_mz82iRyyZKNA3XbUC3tcZC6xxICUZZiXHWEa320WiABVn-3e0QcTOk7iukn81xC_gFe0BMVA</recordid><startdate>200904</startdate><enddate>200904</enddate><creator>Siniscalchi, S.M.</creator><creator>Svendsen, T.</creator><creator>Chin-Hui Lee</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>200904</creationdate><title>A phonetic feature based lattice rescoring approach to LVCSR</title><author>Siniscalchi, S.M. ; Svendsen, T. ; Chin-Hui Lee</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-99386f30725f00489e7e02bf2e6b05cae8d7673c8832236472ef673fe759f6533</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Acoustic signal detection</topic><topic>Artificial neural networks</topic><topic>Automatic speech recognition</topic><topic>Decoding</topic><topic>Detectors</topic><topic>Hidden Markov models</topic><topic>Humans</topic><topic>Lattices</topic><topic>neural networks</topic><topic>Speech recognition</topic><topic>Vocabulary</topic><toplevel>online_resources</toplevel><creatorcontrib>Siniscalchi, S.M.</creatorcontrib><creatorcontrib>Svendsen, T.</creatorcontrib><creatorcontrib>Chin-Hui Lee</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Siniscalchi, S.M.</au><au>Svendsen, T.</au><au>Chin-Hui Lee</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A phonetic feature based lattice rescoring approach to LVCSR</atitle><btitle>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2009-04</date><risdate>2009</risdate><spage>3865</spage><epage>3868</epage><pages>3865-3868</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424423538</isbn><isbn>1424423538</isbn><eisbn>9781424423545</eisbn><eisbn>1424423546</eisbn><abstract>Large Vocabulary Continuous Speech Recognition (LVCSR) systems decode the input speech using diverse information sources, such as acoustic, lexical, and linguistic. Although most of the unreliable hypotheses are pruned during the recognition process, current state-of-the-art systems often make errors that are ldquounreasonablerdquo for human listeners. Several studies have shown that a proper integration of acoustic-phonetic information can be beneficial to reducing such errors. We have previously shown that high-accuracy phone recognition can be achieved if a bank of speech attribute detectors is used to compute a confidence score describing attribute activation levels that the current frame exhibits. In those experiments, the phone recognition system did not rely on the language model to follow their word sequence constraints, and the vocabulary was small. In this work, we extend our approach to LVCSR by introducing a second recognition step during which additional information not directly used during conventional log-likelihood based decoding is introduced. Experimental results show promising performance.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2009.4960471</doi><tpages>4</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-6149
ispartof	2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.3865-3868
issn	1520-6149 2379-190X
language	eng
recordid	cdi_ieee_primary_4960471
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Acoustic signal detection Artificial neural networks Automatic speech recognition Decoding Detectors Hidden Markov models Humans Lattices neural networks Speech recognition Vocabulary
title	A phonetic feature based lattice rescoring approach to LVCSR
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T06%3A34%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20phonetic%20feature%20based%20lattice%20rescoring%20approach%20to%20LVCSR&rft.btitle=2009%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Siniscalchi,%20S.M.&rft.date=2009-04&rft.spage=3865&rft.epage=3868&rft.pages=3865-3868&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424423538&rft.isbn_list=1424423538&rft_id=info:doi/10.1109/ICASSP.2009.4960471&rft_dat=%3Cieee_6IE%3E4960471%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424423545&rft.eisbn_list=1424423546&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4960471&rfr_iscdi=true