Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis

In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also need...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2006-09, Vol.14 (5), p.1763-1771
Hauptverfasser: Vepa, J., King, S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1771
container_issue 5
container_start_page 1763
container_title IEEE transactions on audio, speech, and language processing
container_volume 14
creator Vepa, J.
King, S.
description In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method
doi_str_mv 10.1109/TSA.2005.858548
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pascalfrancis_primary_18049106</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1677995</ieee_id><sourcerecordid>2568845531</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-f3f7518f082feaf94a0c131c2832bd318942184994a4930d2a2a7fad4c57caa73</originalsourceid><addsrcrecordid>eNpdkE1LAzEQhhdRsH6cPXgJgnhqTTZJkxxF_ALBg3oTQsxObMo2qTvZQv-9W1oUPM3APO_L8FTVGaMTxqi5fnu9mdSUyomWWgq9V42YlHqsTC32f3c2PayOEOeUCj4VbFR9vPafc_AlroDAyrW9KzEnkgOZ55iIz1iISw3BRc5lFtMXWUCZ5QZJyB3pUywEod0UDClcAvgZwXUqM8CIJ9VBcC3C6W4eV-_3d2-3j-Pnl4en25vnseeGlXHgQUmmA9V1ABeMcNQzznytef3ZcKaNqJkWZjgIw2lTu9qp4BrhpfLOKX5cXW17l13-7gGLXUT00LYuQe7Ram24MlLIgbz4R85z36XhOWuYEpwqRgfoegv5LiN2EOyyiwvXrS2jduPaDq7txrXduh4Sl7tah961oXPJR_yLaSoMo9OBO99yEQD-zlOljJH8BxQ4h-Y</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917430710</pqid></control><display><type>article</type><title>Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis</title><source>IEEE Electronic Library (IEL)</source><creator>Vepa, J. ; King, S.</creator><creatorcontrib>Vepa, J. ; King, S.</creatorcontrib><description>In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TSA.2005.858548</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Applied sciences ; Cepstral analysis ; Cost function ; Design engineering ; Discontinuity ; Exact sciences and technology ; Humans ; Information, signal and communications theory ; Join cost ; Lattices ; linear dynamic models (LDM) ; Natural language processing ; perceptual listening tests ; Power measurement ; Signal processing ; Smoothing ; Smoothing methods ; Spectra ; Speech processing ; Speech recognition ; Speech synthesis ; Stimuli ; Studies ; Telecommunications and information theory ; Testing ; unit selection ; Viterbi algorithm</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2006-09, Vol.14 (5), p.1763-1771</ispartof><rights>2006 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c391t-f3f7518f082feaf94a0c131c2832bd318942184994a4930d2a2a7fad4c57caa73</citedby><cites>FETCH-LOGICAL-c391t-f3f7518f082feaf94a0c131c2832bd318942184994a4930d2a2a7fad4c57caa73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1677995$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1677995$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=18049106$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Vepa, J.</creatorcontrib><creatorcontrib>King, S.</creatorcontrib><title>Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method</description><subject>Applied sciences</subject><subject>Cepstral analysis</subject><subject>Cost function</subject><subject>Design engineering</subject><subject>Discontinuity</subject><subject>Exact sciences and technology</subject><subject>Humans</subject><subject>Information, signal and communications theory</subject><subject>Join cost</subject><subject>Lattices</subject><subject>linear dynamic models (LDM)</subject><subject>Natural language processing</subject><subject>perceptual listening tests</subject><subject>Power measurement</subject><subject>Signal processing</subject><subject>Smoothing</subject><subject>Smoothing methods</subject><subject>Spectra</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Speech synthesis</subject><subject>Stimuli</subject><subject>Studies</subject><subject>Telecommunications and information theory</subject><subject>Testing</subject><subject>unit selection</subject><subject>Viterbi algorithm</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1LAzEQhhdRsH6cPXgJgnhqTTZJkxxF_ALBg3oTQsxObMo2qTvZQv-9W1oUPM3APO_L8FTVGaMTxqi5fnu9mdSUyomWWgq9V42YlHqsTC32f3c2PayOEOeUCj4VbFR9vPafc_AlroDAyrW9KzEnkgOZ55iIz1iISw3BRc5lFtMXWUCZ5QZJyB3pUywEod0UDClcAvgZwXUqM8CIJ9VBcC3C6W4eV-_3d2-3j-Pnl4en25vnseeGlXHgQUmmA9V1ABeMcNQzznytef3ZcKaNqJkWZjgIw2lTu9qp4BrhpfLOKX5cXW17l13-7gGLXUT00LYuQe7Ram24MlLIgbz4R85z36XhOWuYEpwqRgfoegv5LiN2EOyyiwvXrS2jduPaDq7txrXduh4Sl7tah961oXPJR_yLaSoMo9OBO99yEQD-zlOljJH8BxQ4h-Y</recordid><startdate>20060901</startdate><enddate>20060901</enddate><creator>Vepa, J.</creator><creator>King, S.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20060901</creationdate><title>Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis</title><author>Vepa, J. ; King, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-f3f7518f082feaf94a0c131c2832bd318942184994a4930d2a2a7fad4c57caa73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Applied sciences</topic><topic>Cepstral analysis</topic><topic>Cost function</topic><topic>Design engineering</topic><topic>Discontinuity</topic><topic>Exact sciences and technology</topic><topic>Humans</topic><topic>Information, signal and communications theory</topic><topic>Join cost</topic><topic>Lattices</topic><topic>linear dynamic models (LDM)</topic><topic>Natural language processing</topic><topic>perceptual listening tests</topic><topic>Power measurement</topic><topic>Signal processing</topic><topic>Smoothing</topic><topic>Smoothing methods</topic><topic>Spectra</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Speech synthesis</topic><topic>Stimuli</topic><topic>Studies</topic><topic>Telecommunications and information theory</topic><topic>Testing</topic><topic>unit selection</topic><topic>Viterbi algorithm</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vepa, J.</creatorcontrib><creatorcontrib>King, S.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vepa, J.</au><au>King, S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2006-09-01</date><risdate>2006</risdate><volume>14</volume><issue>5</issue><spage>1763</spage><epage>1771</epage><pages>1763-1771</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TSA.2005.858548</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2006-09, Vol.14 (5), p.1763-1771
issn 1558-7916
2329-9290
1558-7924
2329-9304
language eng
recordid cdi_pascalfrancis_primary_18049106
source IEEE Electronic Library (IEL)
subjects Applied sciences
Cepstral analysis
Cost function
Design engineering
Discontinuity
Exact sciences and technology
Humans
Information, signal and communications theory
Join cost
Lattices
linear dynamic models (LDM)
Natural language processing
perceptual listening tests
Power measurement
Signal processing
Smoothing
Smoothing methods
Spectra
Speech processing
Speech recognition
Speech synthesis
Stimuli
Studies
Telecommunications and information theory
Testing
unit selection
Viterbi algorithm
title Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T23%3A25%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Subjective%20evaluation%20of%20join%20cost%20and%20smoothing%20methods%20for%20unit%20selection%20speech%20synthesis&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Vepa,%20J.&rft.date=2006-09-01&rft.volume=14&rft.issue=5&rft.spage=1763&rft.epage=1771&rft.pages=1763-1771&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TSA.2005.858548&rft_dat=%3Cproquest_RIE%3E2568845531%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=917430710&rft_id=info:pmid/&rft_ieee_id=1677995&rfr_iscdi=true