Joint prosody prediction and unit selection for concatenative speech synthesis

We describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis, thus effecti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Bulyko, I., Ostendorf, M.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 784 vol.2
container_issue
container_start_page 781
container_title
container_volume 2
creator Bulyko, I.
Ostendorf, M.
description We describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis, thus effectively expanding the space of possible prosodic targets. We implemented a symbolic prosody prediction module and a unit selection database as the synthesis components of a travel planning system. Results of perceptual experiments show that by combining the steps of prosody prediction and unit selection we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation.
doi_str_mv 10.1109/ICASSP.2001.941031
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_941031</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>941031</ieee_id><sourcerecordid>941031</sourcerecordid><originalsourceid>FETCH-LOGICAL-i87t-8da4c6fd8c60ffbbe43b38d2bca10be390c584012c2ed093bfccdcac33eb690a3</originalsourceid><addsrcrecordid>eNotkM1KAzEURoM_YFt9ga7yAjPeTNKZZClFrVJUaBfuSnJzh0ZqZphEYd7ewrg68C0OH4expYBSCDD3L-uH3e6jrABEaZQAKS7YrJKNKYSBz0s2h0aDbEAJdcVmYlVBUQtlbtg8pS8A0I3SM_b22oWYeT90qfPjmeQD5tBFbqPnPzFknuhE09R2A8cuos0UbQ6_xFNPhEeexpiPlEK6ZdetPSW6--eC7Z8e9-tNsX1_Ph_eFkE3udDeKqxbr7GGtnWOlHRS-8qhFeBIGsCVViAqrMiDka5F9GhRSnK1ASsXbDlpAxEd-iF822E8TBXkHyYXUn4</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Joint prosody prediction and unit selection for concatenative speech synthesis</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Bulyko, I. ; Ostendorf, M.</creator><creatorcontrib>Bulyko, I. ; Ostendorf, M.</creatorcontrib><description>We describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis, thus effectively expanding the space of possible prosodic targets. We implemented a symbolic prosody prediction module and a unit selection database as the synthesis components of a travel planning system. Results of perceptual experiments show that by combining the steps of prosody prediction and unit selection we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 0780370414</identifier><identifier>ISBN: 9780780370418</identifier><identifier>EISSN: 2379-190X</identifier><identifier>DOI: 10.1109/ICASSP.2001.941031</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computer interfaces ; Cost function ; Diversity reception ; Space exploration ; Speech processing ; Speech synthesis ; Synthesizers ; Telephony ; Transducers</subject><ispartof>2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001, Vol.2, p.781-784 vol.2</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/941031$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,4050,4051,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/941031$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Bulyko, I.</creatorcontrib><creatorcontrib>Ostendorf, M.</creatorcontrib><title>Joint prosody prediction and unit selection for concatenative speech synthesis</title><title>2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221)</title><addtitle>ICASSP</addtitle><description>We describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis, thus effectively expanding the space of possible prosodic targets. We implemented a symbolic prosody prediction module and a unit selection database as the synthesis components of a travel planning system. Results of perceptual experiments show that by combining the steps of prosody prediction and unit selection we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation.</description><subject>Computer interfaces</subject><subject>Cost function</subject><subject>Diversity reception</subject><subject>Space exploration</subject><subject>Speech processing</subject><subject>Speech synthesis</subject><subject>Synthesizers</subject><subject>Telephony</subject><subject>Transducers</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>0780370414</isbn><isbn>9780780370418</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2001</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotkM1KAzEURoM_YFt9ga7yAjPeTNKZZClFrVJUaBfuSnJzh0ZqZphEYd7ewrg68C0OH4expYBSCDD3L-uH3e6jrABEaZQAKS7YrJKNKYSBz0s2h0aDbEAJdcVmYlVBUQtlbtg8pS8A0I3SM_b22oWYeT90qfPjmeQD5tBFbqPnPzFknuhE09R2A8cuos0UbQ6_xFNPhEeexpiPlEK6ZdetPSW6--eC7Z8e9-tNsX1_Ph_eFkE3udDeKqxbr7GGtnWOlHRS-8qhFeBIGsCVViAqrMiDka5F9GhRSnK1ASsXbDlpAxEd-iF822E8TBXkHyYXUn4</recordid><startdate>2001</startdate><enddate>2001</enddate><creator>Bulyko, I.</creator><creator>Ostendorf, M.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>2001</creationdate><title>Joint prosody prediction and unit selection for concatenative speech synthesis</title><author>Bulyko, I. ; Ostendorf, M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i87t-8da4c6fd8c60ffbbe43b38d2bca10be390c584012c2ed093bfccdcac33eb690a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Computer interfaces</topic><topic>Cost function</topic><topic>Diversity reception</topic><topic>Space exploration</topic><topic>Speech processing</topic><topic>Speech synthesis</topic><topic>Synthesizers</topic><topic>Telephony</topic><topic>Transducers</topic><toplevel>online_resources</toplevel><creatorcontrib>Bulyko, I.</creatorcontrib><creatorcontrib>Ostendorf, M.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bulyko, I.</au><au>Ostendorf, M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Joint prosody prediction and unit selection for concatenative speech synthesis</atitle><btitle>2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221)</btitle><stitle>ICASSP</stitle><date>2001</date><risdate>2001</risdate><volume>2</volume><spage>781</spage><epage>784 vol.2</epage><pages>781-784 vol.2</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>0780370414</isbn><isbn>9780780370418</isbn><abstract>We describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis, thus effectively expanding the space of possible prosodic targets. We implemented a symbolic prosody prediction module and a unit selection database as the synthesis components of a travel planning system. Results of perceptual experiments show that by combining the steps of prosody prediction and unit selection we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2001.941031</doi></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-6149
ispartof 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001, Vol.2, p.781-784 vol.2
issn 1520-6149
2379-190X
language eng
recordid cdi_ieee_primary_941031
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Computer interfaces
Cost function
Diversity reception
Space exploration
Speech processing
Speech synthesis
Synthesizers
Telephony
Transducers
title Joint prosody prediction and unit selection for concatenative speech synthesis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T00%3A41%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Joint%20prosody%20prediction%20and%20unit%20selection%20for%20concatenative%20speech%20synthesis&rft.btitle=2001%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech,%20and%20Signal%20Processing.%20Proceedings%20(Cat.%20No.01CH37221)&rft.au=Bulyko,%20I.&rft.date=2001&rft.volume=2&rft.spage=781&rft.epage=784%20vol.2&rft.pages=781-784%20vol.2&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=0780370414&rft.isbn_list=9780780370418&rft_id=info:doi/10.1109/ICASSP.2001.941031&rft_dat=%3Cieee_6IE%3E941031%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=941031&rfr_iscdi=true