Efficient integrated response generation from multiple targets using weighted finite state transducers

In this paper, we describe how language generation and speech synthesis for spoken dialog systems can be efficiently integrated under a weighted finite state transducer architecture. Taking advantage of this efficiency, we show that introducing flexible targets in generation leads to more natural so...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer speech & language 2002-07, Vol.16 (3), p.533-550
Hauptverfasser:	Bulyko, Ivan, Ostendorf, Mari
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	550
container_issue	3
container_start_page	533
container_title	Computer speech & language
container_volume	16
creator	Bulyko, Ivan Ostendorf, Mari
description	In this paper, we describe how language generation and speech synthesis for spoken dialog systems can be efficiently integrated under a weighted finite state transducer architecture. Taking advantage of this efficiency, we show that introducing flexible targets in generation leads to more natural sounding synthesis. Specifically, we allow multiple wordings of the response and multiple prosodic realizations of the different wordings. The choice of wording and prosodic structure are then jointly optimized with unit selection for waveform generation in speech synthesis. Results of perceptual experiments show that by integrating the steps of language generation and speech synthesis, we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation.
doi_str_mv	10.1016/S0885-2308(02)00023-2
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_85569403</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0885230802000232</els_id><sourcerecordid>27144243</sourcerecordid><originalsourceid>FETCH-LOGICAL-c400t-457477b42df6b91749bb743e3ab3b9fd00364afb119dc6604ba4e7edd08ec5ad3</originalsourceid><addsrcrecordid>eNqNkU1LxDAQhoMouK7-BCEn0UN10qRfJxHxCwQP6jmkyaRGuumapIr_3tYVr3oaZnje9zAPIYcMThmw8uwR6rrIcg71MeQnAJDzLN8iCwZNkdW85Ntk8Yvskr0YXyeoLES1IPbKWqcd-kSdT9gFldDQgHE9-Ii0Q4_TyQ2e2jCs6Grsk1v3SJMKHaZIx-h8Rz_QdS9z0DrvEtKYphqagvLRjBpD3Cc7VvURD37mkjxfXz1d3mb3Dzd3lxf3mRYAKRNFJaqqFbmxZduwSjRtWwmOXLW8bawB4KVQtmWsMbosQbRKYIXGQI26UIYvydGmdx2GtxFjkisXNfa98jiMUdZFUTYC-D9APj1L1H-CecWEyMXcWGxAHYYYA1q5Dm6lwqdkIGdP8tuTnCVIyOW3p2lbkvNNDqe_vDsMMs4-NBoXUCdpBvdHwxcl5JyN</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>27144243</pqid></control><display><type>article</type><title>Efficient integrated response generation from multiple targets using weighted finite state transducers</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Bulyko, Ivan ; Ostendorf, Mari</creator><creatorcontrib>Bulyko, Ivan ; Ostendorf, Mari</creatorcontrib><description>In this paper, we describe how language generation and speech synthesis for spoken dialog systems can be efficiently integrated under a weighted finite state transducer architecture. Taking advantage of this efficiency, we show that introducing flexible targets in generation leads to more natural sounding synthesis. Specifically, we allow multiple wordings of the response and multiple prosodic realizations of the different wordings. The choice of wording and prosodic structure are then jointly optimized with unit selection for waveform generation in speech synthesis. Results of perceptual experiments show that by integrating the steps of language generation and speech synthesis, we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation.</description><identifier>ISSN: 0885-2308</identifier><identifier>EISSN: 1095-8363</identifier><identifier>DOI: 10.1016/S0885-2308(02)00023-2</identifier><identifier>CODEN: CSPLEO</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><ispartof>Computer speech & language, 2002-07, Vol.16 (3), p.533-550</ispartof><rights>2002</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c400t-457477b42df6b91749bb743e3ab3b9fd00364afb119dc6604ba4e7edd08ec5ad3</citedby><cites>FETCH-LOGICAL-c400t-457477b42df6b91749bb743e3ab3b9fd00364afb119dc6604ba4e7edd08ec5ad3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0885230802000232$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Bulyko, Ivan</creatorcontrib><creatorcontrib>Ostendorf, Mari</creatorcontrib><title>Efficient integrated response generation from multiple targets using weighted finite state transducers</title><title>Computer speech & language</title><description>In this paper, we describe how language generation and speech synthesis for spoken dialog systems can be efficiently integrated under a weighted finite state transducer architecture. Taking advantage of this efficiency, we show that introducing flexible targets in generation leads to more natural sounding synthesis. Specifically, we allow multiple wordings of the response and multiple prosodic realizations of the different wordings. The choice of wording and prosodic structure are then jointly optimized with unit selection for waveform generation in speech synthesis. Results of perceptual experiments show that by integrating the steps of language generation and speech synthesis, we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation.</description><issn>0885-2308</issn><issn>1095-8363</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2002</creationdate><recordtype>article</recordtype><recordid>eNqNkU1LxDAQhoMouK7-BCEn0UN10qRfJxHxCwQP6jmkyaRGuumapIr_3tYVr3oaZnje9zAPIYcMThmw8uwR6rrIcg71MeQnAJDzLN8iCwZNkdW85Ntk8Yvskr0YXyeoLES1IPbKWqcd-kSdT9gFldDQgHE9-Ii0Q4_TyQ2e2jCs6Grsk1v3SJMKHaZIx-h8Rz_QdS9z0DrvEtKYphqagvLRjBpD3Cc7VvURD37mkjxfXz1d3mb3Dzd3lxf3mRYAKRNFJaqqFbmxZduwSjRtWwmOXLW8bawB4KVQtmWsMbosQbRKYIXGQI26UIYvydGmdx2GtxFjkisXNfa98jiMUdZFUTYC-D9APj1L1H-CecWEyMXcWGxAHYYYA1q5Dm6lwqdkIGdP8tuTnCVIyOW3p2lbkvNNDqe_vDsMMs4-NBoXUCdpBvdHwxcl5JyN</recordid><startdate>20020701</startdate><enddate>20020701</enddate><creator>Bulyko, Ivan</creator><creator>Ostendorf, Mari</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>8BM</scope><scope>7T9</scope></search><sort><creationdate>20020701</creationdate><title>Efficient integrated response generation from multiple targets using weighted finite state transducers</title><author>Bulyko, Ivan ; Ostendorf, Mari</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c400t-457477b42df6b91749bb743e3ab3b9fd00364afb119dc6604ba4e7edd08ec5ad3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2002</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bulyko, Ivan</creatorcontrib><creatorcontrib>Ostendorf, Mari</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ComDisDome</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>Computer speech & language</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bulyko, Ivan</au><au>Ostendorf, Mari</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient integrated response generation from multiple targets using weighted finite state transducers</atitle><jtitle>Computer speech & language</jtitle><date>2002-07-01</date><risdate>2002</risdate><volume>16</volume><issue>3</issue><spage>533</spage><epage>550</epage><pages>533-550</pages><issn>0885-2308</issn><eissn>1095-8363</eissn><coden>CSPLEO</coden><abstract>In this paper, we describe how language generation and speech synthesis for spoken dialog systems can be efficiently integrated under a weighted finite state transducer architecture. Taking advantage of this efficiency, we show that introducing flexible targets in generation leads to more natural sounding synthesis. Specifically, we allow multiple wordings of the response and multiple prosodic realizations of the different wordings. The choice of wording and prosodic structure are then jointly optimized with unit selection for waveform generation in speech synthesis. Results of perceptual experiments show that by integrating the steps of language generation and speech synthesis, we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/S0885-2308(02)00023-2</doi><tpages>18</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0885-2308
ispartof	Computer speech & language, 2002-07, Vol.16 (3), p.533-550
issn	0885-2308 1095-8363
language	eng
recordid	cdi_proquest_miscellaneous_85569403
source	ScienceDirect Journals (5 years ago - present)
title	Efficient integrated response generation from multiple targets using weighted finite state transducers
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T19%3A02%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20integrated%20response%20generation%20from%20multiple%20targets%20using%20weighted%20finite%20state%20transducers&rft.jtitle=Computer%20speech%20&%20language&rft.au=Bulyko,%20Ivan&rft.date=2002-07-01&rft.volume=16&rft.issue=3&rft.spage=533&rft.epage=550&rft.pages=533-550&rft.issn=0885-2308&rft.eissn=1095-8363&rft.coden=CSPLEO&rft_id=info:doi/10.1016/S0885-2308(02)00023-2&rft_dat=%3Cproquest_cross%3E27144243%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=27144243&rft_id=info:pmid/&rft_els_id=S0885230802000232&rfr_iscdi=true