Low-Resource Speech Translation of Urdu to English Using Semi-Supervised Part-of-Speech Tagging and Transliteration

This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Aminzadeh, A R, Shen, Wade
Format: Report
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Aminzadeh, A R
Shen, Wade
description This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically, we describe 1) the construction of a semi-supervised HMM-based part-of-speech tagger that is used to train factored translation models and 2) the use of an I-HMM-based transliterator from which we derive a spelling-to-pronunciation model for Urdu used in ASR training. We describe experiments performed for both ASR and MT training in the context of the Urdu-to-English task of the NIST MT08 Evaluation and we compare methods making use of additional annotation with standard statistical MT and ASR baselines.
format Report
fullrecord <record><control><sourceid>dtic_1RU</sourceid><recordid>TN_cdi_dtic_stinet_ADA519247</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>ADA519247</sourcerecordid><originalsourceid>FETCH-dtic_stinet_ADA5192473</originalsourceid><addsrcrecordid>eNqFzL0KwjAUhuEuDqLegcO5gQz-IY5FKw4OYtq5hOY0PVCTknOity9KnZ2-4f14phlfw0vdkUOKDYIeEJsOymg890YoeAgtVNEmkACFdz1xBxWTd6DxQUqnAeOTGC3cTBQVWvUzjHOfm_F29Egwfs15NmlNz7gYd5Ytz0V5vCgr1NQs5FHq_JTvVof1dr_5k98nRED-</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>report</recordtype></control><display><type>report</type><title>Low-Resource Speech Translation of Urdu to English Using Semi-Supervised Part-of-Speech Tagging and Transliteration</title><source>DTIC Technical Reports</source><creator>Aminzadeh, A R ; Shen, Wade</creator><creatorcontrib>Aminzadeh, A R ; Shen, Wade ; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB</creatorcontrib><description>This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically, we describe 1) the construction of a semi-supervised HMM-based part-of-speech tagger that is used to train factored translation models and 2) the use of an I-HMM-based transliterator from which we derive a spelling-to-pronunciation model for Urdu used in ASR training. We describe experiments performed for both ASR and MT training in the context of the Urdu-to-English task of the NIST MT08 Evaluation and we compare methods making use of additional annotation with standard statistical MT and ASR baselines.</description><language>eng</language><subject>ENGLISH LANGUAGE ; LEARNING ; Linguistics ; SPEECH ; TRANSLATIONS ; TRANSLITERATION ; URDU LANGUAGE</subject><creationdate>2008</creationdate><rights>Approved for public release; distribution is unlimited.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,776,881,27546,27547</link.rule.ids><linktorsrc>$$Uhttps://apps.dtic.mil/sti/citations/ADA519247$$EView_record_in_DTIC$$FView_record_in_$$GDTIC$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Aminzadeh, A R</creatorcontrib><creatorcontrib>Shen, Wade</creatorcontrib><creatorcontrib>MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB</creatorcontrib><title>Low-Resource Speech Translation of Urdu to English Using Semi-Supervised Part-of-Speech Tagging and Transliteration</title><description>This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically, we describe 1) the construction of a semi-supervised HMM-based part-of-speech tagger that is used to train factored translation models and 2) the use of an I-HMM-based transliterator from which we derive a spelling-to-pronunciation model for Urdu used in ASR training. We describe experiments performed for both ASR and MT training in the context of the Urdu-to-English task of the NIST MT08 Evaluation and we compare methods making use of additional annotation with standard statistical MT and ASR baselines.</description><subject>ENGLISH LANGUAGE</subject><subject>LEARNING</subject><subject>Linguistics</subject><subject>SPEECH</subject><subject>TRANSLATIONS</subject><subject>TRANSLITERATION</subject><subject>URDU LANGUAGE</subject><fulltext>true</fulltext><rsrctype>report</rsrctype><creationdate>2008</creationdate><recordtype>report</recordtype><sourceid>1RU</sourceid><recordid>eNqFzL0KwjAUhuEuDqLegcO5gQz-IY5FKw4OYtq5hOY0PVCTknOity9KnZ2-4f14phlfw0vdkUOKDYIeEJsOymg890YoeAgtVNEmkACFdz1xBxWTd6DxQUqnAeOTGC3cTBQVWvUzjHOfm_F29Egwfs15NmlNz7gYd5Ytz0V5vCgr1NQs5FHq_JTvVof1dr_5k98nRED-</recordid><startdate>200801</startdate><enddate>200801</enddate><creator>Aminzadeh, A R</creator><creator>Shen, Wade</creator><scope>1RU</scope><scope>BHM</scope></search><sort><creationdate>200801</creationdate><title>Low-Resource Speech Translation of Urdu to English Using Semi-Supervised Part-of-Speech Tagging and Transliteration</title><author>Aminzadeh, A R ; Shen, Wade</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-dtic_stinet_ADA5192473</frbrgroupid><rsrctype>reports</rsrctype><prefilter>reports</prefilter><language>eng</language><creationdate>2008</creationdate><topic>ENGLISH LANGUAGE</topic><topic>LEARNING</topic><topic>Linguistics</topic><topic>SPEECH</topic><topic>TRANSLATIONS</topic><topic>TRANSLITERATION</topic><topic>URDU LANGUAGE</topic><toplevel>online_resources</toplevel><creatorcontrib>Aminzadeh, A R</creatorcontrib><creatorcontrib>Shen, Wade</creatorcontrib><creatorcontrib>MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB</creatorcontrib><collection>DTIC Technical Reports</collection><collection>DTIC STINET</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aminzadeh, A R</au><au>Shen, Wade</au><aucorp>MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB</aucorp><format>book</format><genre>unknown</genre><ristype>RPRT</ristype><btitle>Low-Resource Speech Translation of Urdu to English Using Semi-Supervised Part-of-Speech Tagging and Transliteration</btitle><date>2008-01</date><risdate>2008</risdate><abstract>This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically, we describe 1) the construction of a semi-supervised HMM-based part-of-speech tagger that is used to train factored translation models and 2) the use of an I-HMM-based transliterator from which we derive a spelling-to-pronunciation model for Urdu used in ASR training. We describe experiments performed for both ASR and MT training in the context of the Urdu-to-English task of the NIST MT08 Evaluation and we compare methods making use of additional annotation with standard statistical MT and ASR baselines.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_dtic_stinet_ADA519247
source DTIC Technical Reports
subjects ENGLISH LANGUAGE
LEARNING
Linguistics
SPEECH
TRANSLATIONS
TRANSLITERATION
URDU LANGUAGE
title Low-Resource Speech Translation of Urdu to English Using Semi-Supervised Part-of-Speech Tagging and Transliteration
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T08%3A56%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-dtic_1RU&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.btitle=Low-Resource%20Speech%20Translation%20of%20Urdu%20to%20English%20Using%20Semi-Supervised%20Part-of-Speech%20Tagging%20and%20Transliteration&rft.au=Aminzadeh,%20A%20R&rft.aucorp=MASSACHUSETTS%20INST%20OF%20TECH%20LEXINGTON%20LINCOLN%20LAB&rft.date=2008-01&rft_id=info:doi/&rft_dat=%3Cdtic_1RU%3EADA519247%3C/dtic_1RU%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true