Pronunciation modeling using a finite-state transducer representation

The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition we...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Speech communication 2005-06, Vol.46 (2), p.189-203
Hauptverfasser: Hazen, Timothy J., Hetherington, I. Lee, Shu, Han, Livescu, Karen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 203
container_issue 2
container_start_page 189
container_title Speech communication
container_volume 46
creator Hazen, Timothy J.
Hetherington, I. Lee
Shu, Han
Livescu, Karen
description The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.
doi_str_mv 10.1016/j.specom.2005.03.004
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_85631071</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167639305000361</els_id><sourcerecordid>85631071</sourcerecordid><originalsourceid>FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533</originalsourceid><addsrcrecordid>eNqNkUtLxDAUhYMoOI7-Axfd6K41jz7SjSDD-IABXeg6ZG5vJEObjkkq-O9treBO3dy7-c653HMIOWc0Y5SVV7ss7BH6LuOUFhkVGaX5AVkwWfG0YpIfksWIVWkpanFMTkLY0ZGQki_I-sn3bnBgdbS9S7q-wda612QI09SJsc5GTEPUEZPotQvNAOgTj3uPAV380p2SI6PbgGffe0lebtfPq_t083j3sLrZpJBXRUwFp8hp0VSsyHXDt4blvKwAtWGwpVxrCo02wtQNM4YDiryseW4KrCWALoRYksvZd-_7twFDVJ0NgG2rHfZDULIoBaMV-wcohCgr-SfI64LLMp_AfAbB9yF4NGrvbaf9h2JUTS2onZpbUFMLigo1ZjzKLr79dQDdmjFBsOFHOz4oZT19dj1zOMb3btGrABYdYGM9QlRNb38_9AkRqKCb</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>29528648</pqid></control><display><type>article</type><title>Pronunciation modeling using a finite-state transducer representation</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Hazen, Timothy J. ; Hetherington, I. Lee ; Shu, Han ; Livescu, Karen</creator><creatorcontrib>Hazen, Timothy J. ; Hetherington, I. Lee ; Shu, Han ; Livescu, Karen</creatorcontrib><description>The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.</description><identifier>ISSN: 0167-6393</identifier><identifier>EISSN: 1872-7182</identifier><identifier>DOI: 10.1016/j.specom.2005.03.004</identifier><identifier>CODEN: SCOMDH</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Applied sciences ; Exact sciences and technology ; Information, signal and communications theory ; Signal processing ; Speech processing ; Telecommunications and information theory</subject><ispartof>Speech communication, 2005-06, Vol.46 (2), p.189-203</ispartof><rights>2005 Elsevier B.V.</rights><rights>2005 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533</citedby><cites>FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.specom.2005.03.004$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>309,310,314,780,784,789,790,3550,23930,23931,25140,27924,27925,45995</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=16928893$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Hazen, Timothy J.</creatorcontrib><creatorcontrib>Hetherington, I. Lee</creatorcontrib><creatorcontrib>Shu, Han</creatorcontrib><creatorcontrib>Livescu, Karen</creatorcontrib><title>Pronunciation modeling using a finite-state transducer representation</title><title>Speech communication</title><description>The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.</description><subject>Applied sciences</subject><subject>Exact sciences and technology</subject><subject>Information, signal and communications theory</subject><subject>Signal processing</subject><subject>Speech processing</subject><subject>Telecommunications and information theory</subject><issn>0167-6393</issn><issn>1872-7182</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><recordid>eNqNkUtLxDAUhYMoOI7-Axfd6K41jz7SjSDD-IABXeg6ZG5vJEObjkkq-O9treBO3dy7-c653HMIOWc0Y5SVV7ss7BH6LuOUFhkVGaX5AVkwWfG0YpIfksWIVWkpanFMTkLY0ZGQki_I-sn3bnBgdbS9S7q-wda612QI09SJsc5GTEPUEZPotQvNAOgTj3uPAV380p2SI6PbgGffe0lebtfPq_t083j3sLrZpJBXRUwFp8hp0VSsyHXDt4blvKwAtWGwpVxrCo02wtQNM4YDiryseW4KrCWALoRYksvZd-_7twFDVJ0NgG2rHfZDULIoBaMV-wcohCgr-SfI64LLMp_AfAbB9yF4NGrvbaf9h2JUTS2onZpbUFMLigo1ZjzKLr79dQDdmjFBsOFHOz4oZT19dj1zOMb3btGrABYdYGM9QlRNb38_9AkRqKCb</recordid><startdate>20050601</startdate><enddate>20050601</enddate><creator>Hazen, Timothy J.</creator><creator>Hetherington, I. Lee</creator><creator>Shu, Han</creator><creator>Livescu, Karen</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>8BM</scope><scope>7T9</scope></search><sort><creationdate>20050601</creationdate><title>Pronunciation modeling using a finite-state transducer representation</title><author>Hazen, Timothy J. ; Hetherington, I. Lee ; Shu, Han ; Livescu, Karen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Applied sciences</topic><topic>Exact sciences and technology</topic><topic>Information, signal and communications theory</topic><topic>Signal processing</topic><topic>Speech processing</topic><topic>Telecommunications and information theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hazen, Timothy J.</creatorcontrib><creatorcontrib>Hetherington, I. Lee</creatorcontrib><creatorcontrib>Shu, Han</creatorcontrib><creatorcontrib>Livescu, Karen</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ComDisDome</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>Speech communication</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hazen, Timothy J.</au><au>Hetherington, I. Lee</au><au>Shu, Han</au><au>Livescu, Karen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Pronunciation modeling using a finite-state transducer representation</atitle><jtitle>Speech communication</jtitle><date>2005-06-01</date><risdate>2005</risdate><volume>46</volume><issue>2</issue><spage>189</spage><epage>203</epage><pages>189-203</pages><issn>0167-6393</issn><eissn>1872-7182</eissn><coden>SCOMDH</coden><abstract>The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.specom.2005.03.004</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0167-6393
ispartof Speech communication, 2005-06, Vol.46 (2), p.189-203
issn 0167-6393
1872-7182
language eng
recordid cdi_proquest_miscellaneous_85631071
source Elsevier ScienceDirect Journals Complete
subjects Applied sciences
Exact sciences and technology
Information, signal and communications theory
Signal processing
Speech processing
Telecommunications and information theory
title Pronunciation modeling using a finite-state transducer representation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T13%3A55%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Pronunciation%20modeling%20using%20a%20finite-state%20transducer%20representation&rft.jtitle=Speech%20communication&rft.au=Hazen,%20Timothy%20J.&rft.date=2005-06-01&rft.volume=46&rft.issue=2&rft.spage=189&rft.epage=203&rft.pages=189-203&rft.issn=0167-6393&rft.eissn=1872-7182&rft.coden=SCOMDH&rft_id=info:doi/10.1016/j.specom.2005.03.004&rft_dat=%3Cproquest_cross%3E85631071%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=29528648&rft_id=info:pmid/&rft_els_id=S0167639305000361&rfr_iscdi=true