VTLN adaptation for statistical speech synthesis

The advent of statistical speech synthesis has enabled the unification of the basic techniques used in speech synthesis and recognition. Adaptation techniques that have been successfully used in recognition systems can now be applied to synthesis systems to improve the quality of the synthesized spe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Saheer, Lakshmi, Garner, Philip N, Dines, John, Hui Liang
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4841
container_issue
container_start_page 4838
container_title
container_volume
creator Saheer, Lakshmi
Garner, Philip N
Dines, John
Hui Liang
description The advent of statistical speech synthesis has enabled the unification of the basic techniques used in speech synthesis and recognition. Adaptation techniques that have been successfully used in recognition systems can now be applied to synthesis systems to improve the quality of the synthesized speech. The application of vocal tract length normalization (VTLN) for synthesis is explored in this paper. VTLN based adaptation requires estimation of a single warping factor, which can be accurately estimated from very little adaptation data and gives additive improvements over CMLLR adaptation. The challenge of estimating accurate warping factors using higher order features is solved by initializing warping factor estimation with the values calculated from lower order features.
doi_str_mv 10.1109/ICASSP.2010.5495126
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5495126</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5495126</ieee_id><sourcerecordid>5495126</sourcerecordid><originalsourceid>FETCH-LOGICAL-i220t-42c1cc215d2d6ee5824750a2897e11f768d619f092b6f92f3285dbb9eb02164d3</originalsourceid><addsrcrecordid>eNpVj81KA0EQhMc_MMY8QS77Ahu7e-evjxI0CosKieItzO7MkJGYLDt7ydsbMRdPVdQHRZUQU4QZIvDd8_x-uXybERwDJVkh6TMxYWNRkpSSWOtzMaLKcIkMnxf_mOJLMUJFUGqUfC1ucv4CAGukHQn4WNUvhfOuG9yQ9rsi7vsi__o8pNZti9yF0G6KfNgNm5BTvhVX0W1zmJx0LN4fH1bzp7J-XRxn1mUigqGU1GLbEipPXoegLEmjwJFlExCj0dZr5AhMjY5MsSKrfNNwaIBQS1-NxfSvN4UQ1l2fvl1_WJ_OVz87H0iN</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>VTLN adaptation for statistical speech synthesis</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Saheer, Lakshmi ; Garner, Philip N ; Dines, John ; Hui Liang</creator><creatorcontrib>Saheer, Lakshmi ; Garner, Philip N ; Dines, John ; Hui Liang</creatorcontrib><description>The advent of statistical speech synthesis has enabled the unification of the basic techniques used in speech synthesis and recognition. Adaptation techniques that have been successfully used in recognition systems can now be applied to synthesis systems to improve the quality of the synthesized speech. The application of vocal tract length normalization (VTLN) for synthesis is explored in this paper. VTLN based adaptation requires estimation of a single warping factor, which can be accurately estimated from very little adaptation data and gives additive improvements over CMLLR adaptation. The challenge of estimating accurate warping factors using higher order features is solved by initializing warping factor estimation with the values calculated from lower order features.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424442959</identifier><identifier>ISBN: 1424442958</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424442966</identifier><identifier>EISBN: 1424442966</identifier><identifier>DOI: 10.1109/ICASSP.2010.5495126</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptation ; Adaptation model ; Automatic speech recognition ; Cepstral analysis ; Feature extraction ; Frequency ; Hidden Markov models ; Maximum likelihood linear regression ; Speech recognition ; Speech synthesis ; Statistical Speech Synthesis ; Vectors ; Vocal Tract Length Normalization</subject><ispartof>2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, p.4838-4841</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5495126$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5495126$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Saheer, Lakshmi</creatorcontrib><creatorcontrib>Garner, Philip N</creatorcontrib><creatorcontrib>Dines, John</creatorcontrib><creatorcontrib>Hui Liang</creatorcontrib><title>VTLN adaptation for statistical speech synthesis</title><title>2010 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>The advent of statistical speech synthesis has enabled the unification of the basic techniques used in speech synthesis and recognition. Adaptation techniques that have been successfully used in recognition systems can now be applied to synthesis systems to improve the quality of the synthesized speech. The application of vocal tract length normalization (VTLN) for synthesis is explored in this paper. VTLN based adaptation requires estimation of a single warping factor, which can be accurately estimated from very little adaptation data and gives additive improvements over CMLLR adaptation. The challenge of estimating accurate warping factors using higher order features is solved by initializing warping factor estimation with the values calculated from lower order features.</description><subject>Adaptation</subject><subject>Adaptation model</subject><subject>Automatic speech recognition</subject><subject>Cepstral analysis</subject><subject>Feature extraction</subject><subject>Frequency</subject><subject>Hidden Markov models</subject><subject>Maximum likelihood linear regression</subject><subject>Speech recognition</subject><subject>Speech synthesis</subject><subject>Statistical Speech Synthesis</subject><subject>Vectors</subject><subject>Vocal Tract Length Normalization</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424442959</isbn><isbn>1424442958</isbn><isbn>9781424442966</isbn><isbn>1424442966</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVj81KA0EQhMc_MMY8QS77Ahu7e-evjxI0CosKieItzO7MkJGYLDt7ydsbMRdPVdQHRZUQU4QZIvDd8_x-uXybERwDJVkh6TMxYWNRkpSSWOtzMaLKcIkMnxf_mOJLMUJFUGqUfC1ucv4CAGukHQn4WNUvhfOuG9yQ9rsi7vsi__o8pNZti9yF0G6KfNgNm5BTvhVX0W1zmJx0LN4fH1bzp7J-XRxn1mUigqGU1GLbEipPXoegLEmjwJFlExCj0dZr5AhMjY5MsSKrfNNwaIBQS1-NxfSvN4UQ1l2fvl1_WJ_OVz87H0iN</recordid><startdate>20100101</startdate><enddate>20100101</enddate><creator>Saheer, Lakshmi</creator><creator>Garner, Philip N</creator><creator>Dines, John</creator><creator>Hui Liang</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20100101</creationdate><title>VTLN adaptation for statistical speech synthesis</title><author>Saheer, Lakshmi ; Garner, Philip N ; Dines, John ; Hui Liang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i220t-42c1cc215d2d6ee5824750a2897e11f768d619f092b6f92f3285dbb9eb02164d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Adaptation</topic><topic>Adaptation model</topic><topic>Automatic speech recognition</topic><topic>Cepstral analysis</topic><topic>Feature extraction</topic><topic>Frequency</topic><topic>Hidden Markov models</topic><topic>Maximum likelihood linear regression</topic><topic>Speech recognition</topic><topic>Speech synthesis</topic><topic>Statistical Speech Synthesis</topic><topic>Vectors</topic><topic>Vocal Tract Length Normalization</topic><toplevel>online_resources</toplevel><creatorcontrib>Saheer, Lakshmi</creatorcontrib><creatorcontrib>Garner, Philip N</creatorcontrib><creatorcontrib>Dines, John</creatorcontrib><creatorcontrib>Hui Liang</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Saheer, Lakshmi</au><au>Garner, Philip N</au><au>Dines, John</au><au>Hui Liang</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>VTLN adaptation for statistical speech synthesis</atitle><btitle>2010 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2010-01-01</date><risdate>2010</risdate><spage>4838</spage><epage>4841</epage><pages>4838-4841</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424442959</isbn><isbn>1424442958</isbn><eisbn>9781424442966</eisbn><eisbn>1424442966</eisbn><abstract>The advent of statistical speech synthesis has enabled the unification of the basic techniques used in speech synthesis and recognition. Adaptation techniques that have been successfully used in recognition systems can now be applied to synthesis systems to improve the quality of the synthesized speech. The application of vocal tract length normalization (VTLN) for synthesis is explored in this paper. VTLN based adaptation requires estimation of a single warping factor, which can be accurately estimated from very little adaptation data and gives additive improvements over CMLLR adaptation. The challenge of estimating accurate warping factors using higher order features is solved by initializing warping factor estimation with the values calculated from lower order features.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2010.5495126</doi><tpages>4</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-6149
ispartof 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, p.4838-4841
issn 1520-6149
2379-190X
language eng
recordid cdi_ieee_primary_5495126
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Adaptation
Adaptation model
Automatic speech recognition
Cepstral analysis
Feature extraction
Frequency
Hidden Markov models
Maximum likelihood linear regression
Speech recognition
Speech synthesis
Statistical Speech Synthesis
Vectors
Vocal Tract Length Normalization
title VTLN adaptation for statistical speech synthesis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T15%3A08%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=VTLN%20adaptation%20for%20statistical%20speech%20synthesis&rft.btitle=2010%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Saheer,%20Lakshmi&rft.date=2010-01-01&rft.spage=4838&rft.epage=4841&rft.pages=4838-4841&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424442959&rft.isbn_list=1424442958&rft_id=info:doi/10.1109/ICASSP.2010.5495126&rft_dat=%3Cieee_6IE%3E5495126%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424442966&rft.eisbn_list=1424442966&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5495126&rfr_iscdi=true