Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis

In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also need...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2006-09, Vol.14 (5), p.1763-1771
Hauptverfasser:	Vepa, J., King, S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Cepstral analysis Cost function Design engineering Discontinuity Exact sciences and technology Humans Information, signal and communications theory Join cost Lattices linear dynamic models (LDM) Natural language processing perceptual listening tests Power measurement Signal processing Smoothing Smoothing methods Spectra Speech processing Speech recognition Speech synthesis Stimuli Studies Telecommunications and information theory Testing unit selection Viterbi algorithm
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1771
container_issue	5
container_start_page	1763
container_title	IEEE transactions on audio, speech, and language processing
container_volume	14
creator	Vepa, J. King, S.
description	In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method
doi_str_mv	10.1109/TSA.2005.858548
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pascalfrancis_primary_18049106</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1677995</ieee_id><sourcerecordid>2568845531</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-f3f7518f082feaf94a0c131c2832bd318942184994a4930d2a2a7fad4c57caa73</originalsourceid><addsrcrecordid>eNpdkE1LAzEQhhdRsH6cPXgJgnhqTTZJkxxF_ALBg3oTQsxObMo2qTvZQv-9W1oUPM3APO_L8FTVGaMTxqi5fnu9mdSUyomWWgq9V42YlHqsTC32f3c2PayOEOeUCj4VbFR9vPafc_AlroDAyrW9KzEnkgOZ55iIz1iISw3BRc5lFtMXWUCZ5QZJyB3pUywEod0UDClcAvgZwXUqM8CIJ9VBcC3C6W4eV-_3d2-3j-Pnl4en25vnseeGlXHgQUmmA9V1ABeMcNQzznytef3ZcKaNqJkWZjgIw2lTu9qp4BrhpfLOKX5cXW17l13-7gGLXUT00LYuQe7Ram24MlLIgbz4R85z36XhOWuYEpwqRgfoegv5LiN2EOyyiwvXrS2jduPaDq7txrXduh4Sl7tah961oXPJR_yLaSoMo9OBO99yEQD-zlOljJH8BxQ4h-Y</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917430710</pqid></control><display><type>article</type><title>Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis</title><source>IEEE Electronic Library (IEL)</source><creator>Vepa, J. ; King, S.</creator><creatorcontrib>Vepa, J. ; King, S.</creatorcontrib><description>In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TSA.2005.858548</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Applied sciences ; Cepstral analysis ; Cost function ; Design engineering ; Discontinuity ; Exact sciences and technology ; Humans ; Information, signal and communications theory ; Join cost ; Lattices ; linear dynamic models (LDM) ; Natural language processing ; perceptual listening tests ; Power measurement ; Signal processing ; Smoothing ; Smoothing methods ; Spectra ; Speech processing ; Speech recognition ; Speech synthesis ; Stimuli ; Studies ; Telecommunications and information theory ; Testing ; unit selection ; Viterbi algorithm</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2006-09, Vol.14 (5), p.1763-1771</ispartof><rights>2006 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c391t-f3f7518f082feaf94a0c131c2832bd318942184994a4930d2a2a7fad4c57caa73</citedby><cites>FETCH-LOGICAL-c391t-f3f7518f082feaf94a0c131c2832bd318942184994a4930d2a2a7fad4c57caa73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1677995$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1677995$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=18049106$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Vepa, J.</creatorcontrib><creatorcontrib>King, S.</creatorcontrib><title>Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method</description><subject>Applied sciences</subject><subject>Cepstral analysis</subject><subject>Cost function</subject><subject>Design engineering</subject><subject>Discontinuity</subject><subject>Exact sciences and technology</subject><subject>Humans</subject><subject>Information, signal and communications theory</subject><subject>Join cost</subject><subject>Lattices</subject><subject>linear dynamic models (LDM)</subject><subject>Natural language processing</subject><subject>perceptual listening tests</subject><subject>Power measurement</subject><subject>Signal processing</subject><subject>Smoothing</subject><subject>Smoothing methods</subject><subject>Spectra</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Speech synthesis</subject><subject>Stimuli</subject><subject>Studies</subject><subject>Telecommunications and information theory</subject><subject>Testing</subject><subject>unit selection</subject><subject>Viterbi algorithm</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1LAzEQhhdRsH6cPXgJgnhqTTZJkxxF_ALBg3oTQsxObMo2qTvZQv-9W1oUPM3APO_L8FTVGaMTxqi5fnu9mdSUyomWWgq9V42YlHqsTC32f3c2PayOEOeUCj4VbFR9vPafc_AlroDAyrW9KzEnkgOZ55iIz1iISw3BRc5lFtMXWUCZ5QZJyB3pUywEod0UDClcAvgZwXUqM8CIJ9VBcC3C6W4eV-_3d2-3j-Pnl4en25vnseeGlXHgQUmmA9V1ABeMcNQzznytef3ZcKaNqJkWZjgIw2lTu9qp4BrhpfLOKX5cXW17l13-7gGLXUT00LYuQe7Ram24MlLIgbz4R85z36XhOWuYEpwqRgfoegv5LiN2EOyyiwvXrS2jduPaDq7txrXduh4Sl7tah961oXPJR_yLaSoMo9OBO99yEQD-zlOljJH8BxQ4h-Y</recordid><startdate>20060901</startdate><enddate>20060901</enddate><creator>Vepa, J.</creator><creator>King, S.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20060901</creationdate><title>Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis</title><author>Vepa, J. ; King, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-f3f7518f082feaf94a0c131c2832bd318942184994a4930d2a2a7fad4c57caa73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Applied sciences</topic><topic>Cepstral analysis</topic><topic>Cost function</topic><topic>Design engineering</topic><topic>Discontinuity</topic><topic>Exact sciences and technology</topic><topic>Humans</topic><topic>Information, signal and communications theory</topic><topic>Join cost</topic><topic>Lattices</topic><topic>linear dynamic models (LDM)</topic><topic>Natural language processing</topic><topic>perceptual listening tests</topic><topic>Power measurement</topic><topic>Signal processing</topic><topic>Smoothing</topic><topic>Smoothing methods</topic><topic>Spectra</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Speech synthesis</topic><topic>Stimuli</topic><topic>Studies</topic><topic>Telecommunications and information theory</topic><topic>Testing</topic><topic>unit selection</topic><topic>Viterbi algorithm</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vepa, J.</creatorcontrib><creatorcontrib>King, S.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vepa, J.</au><au>King, S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2006-09-01</date><risdate>2006</risdate><volume>14</volume><issue>5</issue><spage>1763</spage><epage>1771</epage><pages>1763-1771</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TSA.2005.858548</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1558-7916
ispartof	IEEE transactions on audio, speech, and language processing, 2006-09, Vol.14 (5), p.1763-1771
issn	1558-7916 2329-9290 1558-7924 2329-9304
language	eng
recordid	cdi_pascalfrancis_primary_18049106
source	IEEE Electronic Library (IEL)
subjects	Applied sciences Cepstral analysis Cost function Design engineering Discontinuity Exact sciences and technology Humans Information, signal and communications theory Join cost Lattices linear dynamic models (LDM) Natural language processing perceptual listening tests Power measurement Signal processing Smoothing Smoothing methods Spectra Speech processing Speech recognition Speech synthesis Stimuli Studies Telecommunications and information theory Testing unit selection Viterbi algorithm
title	Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T23%3A25%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Subjective%20evaluation%20of%20join%20cost%20and%20smoothing%20methods%20for%20unit%20selection%20speech%20synthesis&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Vepa,%20J.&rft.date=2006-09-01&rft.volume=14&rft.issue=5&rft.spage=1763&rft.epage=1771&rft.pages=1763-1771&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TSA.2005.858548&rft_dat=%3Cproquest_RIE%3E2568845531%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=917430710&rft_id=info:pmid/&rft_ieee_id=1677995&rfr_iscdi=true