Pronunciation modeling using a finite-state transducer representation

The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition we...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Speech communication 2005-06, Vol.46 (2), p.189-203
Hauptverfasser:	Hazen, Timothy J., Hetherington, I. Lee, Shu, Han, Livescu, Karen
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Exact sciences and technology Information, signal and communications theory Signal processing Speech processing Telecommunications and information theory
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	203
container_issue	2
container_start_page	189
container_title	Speech communication
container_volume	46
creator	Hazen, Timothy J. Hetherington, I. Lee Shu, Han Livescu, Karen
description	The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.
doi_str_mv	10.1016/j.specom.2005.03.004
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_85631071</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167639305000361</els_id><sourcerecordid>85631071</sourcerecordid><originalsourceid>FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533</originalsourceid><addsrcrecordid>eNqNkUtLxDAUhYMoOI7-Axfd6K41jz7SjSDD-IABXeg6ZG5vJEObjkkq-O9treBO3dy7-c653HMIOWc0Y5SVV7ss7BH6LuOUFhkVGaX5AVkwWfG0YpIfksWIVWkpanFMTkLY0ZGQki_I-sn3bnBgdbS9S7q-wda612QI09SJsc5GTEPUEZPotQvNAOgTj3uPAV380p2SI6PbgGffe0lebtfPq_t083j3sLrZpJBXRUwFp8hp0VSsyHXDt4blvKwAtWGwpVxrCo02wtQNM4YDiryseW4KrCWALoRYksvZd-_7twFDVJ0NgG2rHfZDULIoBaMV-wcohCgr-SfI64LLMp_AfAbB9yF4NGrvbaf9h2JUTS2onZpbUFMLigo1ZjzKLr79dQDdmjFBsOFHOz4oZT19dj1zOMb3btGrABYdYGM9QlRNb38_9AkRqKCb</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>29528648</pqid></control><display><type>article</type><title>Pronunciation modeling using a finite-state transducer representation</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Hazen, Timothy J. ; Hetherington, I. Lee ; Shu, Han ; Livescu, Karen</creator><creatorcontrib>Hazen, Timothy J. ; Hetherington, I. Lee ; Shu, Han ; Livescu, Karen</creatorcontrib><description>The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.</description><identifier>ISSN: 0167-6393</identifier><identifier>EISSN: 1872-7182</identifier><identifier>DOI: 10.1016/j.specom.2005.03.004</identifier><identifier>CODEN: SCOMDH</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Applied sciences ; Exact sciences and technology ; Information, signal and communications theory ; Signal processing ; Speech processing ; Telecommunications and information theory</subject><ispartof>Speech communication, 2005-06, Vol.46 (2), p.189-203</ispartof><rights>2005 Elsevier B.V.</rights><rights>2005 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533</citedby><cites>FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.specom.2005.03.004$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>309,310,314,780,784,789,790,3550,23930,23931,25140,27924,27925,45995</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16928893$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Hazen, Timothy J.</creatorcontrib><creatorcontrib>Hetherington, I. Lee</creatorcontrib><creatorcontrib>Shu, Han</creatorcontrib><creatorcontrib>Livescu, Karen</creatorcontrib><title>Pronunciation modeling using a finite-state transducer representation</title><title>Speech communication</title><description>The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.</description><subject>Applied sciences</subject><subject>Exact sciences and technology</subject><subject>Information, signal and communications theory</subject><subject>Signal processing</subject><subject>Speech processing</subject><subject>Telecommunications and information theory</subject><issn>0167-6393</issn><issn>1872-7182</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><recordid>eNqNkUtLxDAUhYMoOI7-Axfd6K41jz7SjSDD-IABXeg6ZG5vJEObjkkq-O9treBO3dy7-c653HMIOWc0Y5SVV7ss7BH6LuOUFhkVGaX5AVkwWfG0YpIfksWIVWkpanFMTkLY0ZGQki_I-sn3bnBgdbS9S7q-wda612QI09SJsc5GTEPUEZPotQvNAOgTj3uPAV380p2SI6PbgGffe0lebtfPq_t083j3sLrZpJBXRUwFp8hp0VSsyHXDt4blvKwAtWGwpVxrCo02wtQNM4YDiryseW4KrCWALoRYksvZd-_7twFDVJ0NgG2rHfZDULIoBaMV-wcohCgr-SfI64LLMp_AfAbB9yF4NGrvbaf9h2JUTS2onZpbUFMLigo1ZjzKLr79dQDdmjFBsOFHOz4oZT19dj1zOMb3btGrABYdYGM9QlRNb38_9AkRqKCb</recordid><startdate>20050601</startdate><enddate>20050601</enddate><creator>Hazen, Timothy J.</creator><creator>Hetherington, I. Lee</creator><creator>Shu, Han</creator><creator>Livescu, Karen</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>8BM</scope><scope>7T9</scope></search><sort><creationdate>20050601</creationdate><title>Pronunciation modeling using a finite-state transducer representation</title><author>Hazen, Timothy J. ; Hetherington, I. Lee ; Shu, Han ; Livescu, Karen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Applied sciences</topic><topic>Exact sciences and technology</topic><topic>Information, signal and communications theory</topic><topic>Signal processing</topic><topic>Speech processing</topic><topic>Telecommunications and information theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hazen, Timothy J.</creatorcontrib><creatorcontrib>Hetherington, I. Lee</creatorcontrib><creatorcontrib>Shu, Han</creatorcontrib><creatorcontrib>Livescu, Karen</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ComDisDome</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>Speech communication</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hazen, Timothy J.</au><au>Hetherington, I. Lee</au><au>Shu, Han</au><au>Livescu, Karen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Pronunciation modeling using a finite-state transducer representation</atitle><jtitle>Speech communication</jtitle><date>2005-06-01</date><risdate>2005</risdate><volume>46</volume><issue>2</issue><spage>189</spage><epage>203</epage><pages>189-203</pages><issn>0167-6393</issn><eissn>1872-7182</eissn><coden>SCOMDH</coden><abstract>The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.specom.2005.03.004</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0167-6393
ispartof	Speech communication, 2005-06, Vol.46 (2), p.189-203
issn	0167-6393 1872-7182
language	eng
recordid	cdi_proquest_miscellaneous_85631071
source	Elsevier ScienceDirect Journals Complete
subjects	Applied sciences Exact sciences and technology Information, signal and communications theory Signal processing Speech processing Telecommunications and information theory
title	Pronunciation modeling using a finite-state transducer representation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T13%3A55%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Pronunciation%20modeling%20using%20a%20finite-state%20transducer%20representation&rft.jtitle=Speech%20communication&rft.au=Hazen,%20Timothy%20J.&rft.date=2005-06-01&rft.volume=46&rft.issue=2&rft.spage=189&rft.epage=203&rft.pages=189-203&rft.issn=0167-6393&rft.eissn=1872-7182&rft.coden=SCOMDH&rft_id=info:doi/10.1016/j.specom.2005.03.004&rft_dat=%3Cproquest_cross%3E85631071%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=29528648&rft_id=info:pmid/&rft_els_id=S0167639305000361&rfr_iscdi=true