Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis
In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also need...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on audio, speech, and language processing speech, and language processing, 2006-09, Vol.14 (5), p.1763-1771 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1771 |
---|---|
container_issue | 5 |
container_start_page | 1763 |
container_title | IEEE transactions on audio, speech, and language processing |
container_volume | 14 |
creator | Vepa, J. King, S. |
description | In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method |
doi_str_mv | 10.1109/TSA.2005.858548 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pascalfrancis_primary_18049106</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1677995</ieee_id><sourcerecordid>2568845531</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-f3f7518f082feaf94a0c131c2832bd318942184994a4930d2a2a7fad4c57caa73</originalsourceid><addsrcrecordid>eNpdkE1LAzEQhhdRsH6cPXgJgnhqTTZJkxxF_ALBg3oTQsxObMo2qTvZQv-9W1oUPM3APO_L8FTVGaMTxqi5fnu9mdSUyomWWgq9V42YlHqsTC32f3c2PayOEOeUCj4VbFR9vPafc_AlroDAyrW9KzEnkgOZ55iIz1iISw3BRc5lFtMXWUCZ5QZJyB3pUywEod0UDClcAvgZwXUqM8CIJ9VBcC3C6W4eV-_3d2-3j-Pnl4en25vnseeGlXHgQUmmA9V1ABeMcNQzznytef3ZcKaNqJkWZjgIw2lTu9qp4BrhpfLOKX5cXW17l13-7gGLXUT00LYuQe7Ram24MlLIgbz4R85z36XhOWuYEpwqRgfoegv5LiN2EOyyiwvXrS2jduPaDq7txrXduh4Sl7tah961oXPJR_yLaSoMo9OBO99yEQD-zlOljJH8BxQ4h-Y</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917430710</pqid></control><display><type>article</type><title>Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis</title><source>IEEE Electronic Library (IEL)</source><creator>Vepa, J. ; King, S.</creator><creatorcontrib>Vepa, J. ; King, S.</creatorcontrib><description>In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TSA.2005.858548</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Applied sciences ; Cepstral analysis ; Cost function ; Design engineering ; Discontinuity ; Exact sciences and technology ; Humans ; Information, signal and communications theory ; Join cost ; Lattices ; linear dynamic models (LDM) ; Natural language processing ; perceptual listening tests ; Power measurement ; Signal processing ; Smoothing ; Smoothing methods ; Spectra ; Speech processing ; Speech recognition ; Speech synthesis ; Stimuli ; Studies ; Telecommunications and information theory ; Testing ; unit selection ; Viterbi algorithm</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2006-09, Vol.14 (5), p.1763-1771</ispartof><rights>2006 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c391t-f3f7518f082feaf94a0c131c2832bd318942184994a4930d2a2a7fad4c57caa73</citedby><cites>FETCH-LOGICAL-c391t-f3f7518f082feaf94a0c131c2832bd318942184994a4930d2a2a7fad4c57caa73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1677995$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1677995$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=18049106$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Vepa, J.</creatorcontrib><creatorcontrib>King, S.</creatorcontrib><title>Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method</description><subject>Applied sciences</subject><subject>Cepstral analysis</subject><subject>Cost function</subject><subject>Design engineering</subject><subject>Discontinuity</subject><subject>Exact sciences and technology</subject><subject>Humans</subject><subject>Information, signal and communications theory</subject><subject>Join cost</subject><subject>Lattices</subject><subject>linear dynamic models (LDM)</subject><subject>Natural language processing</subject><subject>perceptual listening tests</subject><subject>Power measurement</subject><subject>Signal processing</subject><subject>Smoothing</subject><subject>Smoothing methods</subject><subject>Spectra</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Speech synthesis</subject><subject>Stimuli</subject><subject>Studies</subject><subject>Telecommunications and information theory</subject><subject>Testing</subject><subject>unit selection</subject><subject>Viterbi algorithm</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1LAzEQhhdRsH6cPXgJgnhqTTZJkxxF_ALBg3oTQsxObMo2qTvZQv-9W1oUPM3APO_L8FTVGaMTxqi5fnu9mdSUyomWWgq9V42YlHqsTC32f3c2PayOEOeUCj4VbFR9vPafc_AlroDAyrW9KzEnkgOZ55iIz1iISw3BRc5lFtMXWUCZ5QZJyB3pUywEod0UDClcAvgZwXUqM8CIJ9VBcC3C6W4eV-_3d2-3j-Pnl4en25vnseeGlXHgQUmmA9V1ABeMcNQzznytef3ZcKaNqJkWZjgIw2lTu9qp4BrhpfLOKX5cXW17l13-7gGLXUT00LYuQe7Ram24MlLIgbz4R85z36XhOWuYEpwqRgfoegv5LiN2EOyyiwvXrS2jduPaDq7txrXduh4Sl7tah961oXPJR_yLaSoMo9OBO99yEQD-zlOljJH8BxQ4h-Y</recordid><startdate>20060901</startdate><enddate>20060901</enddate><creator>Vepa, J.</creator><creator>King, S.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20060901</creationdate><title>Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis</title><author>Vepa, J. ; King, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-f3f7518f082feaf94a0c131c2832bd318942184994a4930d2a2a7fad4c57caa73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Applied sciences</topic><topic>Cepstral analysis</topic><topic>Cost function</topic><topic>Design engineering</topic><topic>Discontinuity</topic><topic>Exact sciences and technology</topic><topic>Humans</topic><topic>Information, signal and communications theory</topic><topic>Join cost</topic><topic>Lattices</topic><topic>linear dynamic models (LDM)</topic><topic>Natural language processing</topic><topic>perceptual listening tests</topic><topic>Power measurement</topic><topic>Signal processing</topic><topic>Smoothing</topic><topic>Smoothing methods</topic><topic>Spectra</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Speech synthesis</topic><topic>Stimuli</topic><topic>Studies</topic><topic>Telecommunications and information theory</topic><topic>Testing</topic><topic>unit selection</topic><topic>Viterbi algorithm</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vepa, J.</creatorcontrib><creatorcontrib>King, S.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vepa, J.</au><au>King, S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2006-09-01</date><risdate>2006</risdate><volume>14</volume><issue>5</issue><spage>1763</spage><epage>1771</epage><pages>1763-1771</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. This evaluation allows us to further validate their ability to predict concatenation discontinuities. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. In this paper, we report listeners' preferences for each join cost in combination with each smoothing method</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TSA.2005.858548</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1558-7916 |
ispartof | IEEE transactions on audio, speech, and language processing, 2006-09, Vol.14 (5), p.1763-1771 |
issn | 1558-7916 2329-9290 1558-7924 2329-9304 |
language | eng |
recordid | cdi_pascalfrancis_primary_18049106 |
source | IEEE Electronic Library (IEL) |
subjects | Applied sciences Cepstral analysis Cost function Design engineering Discontinuity Exact sciences and technology Humans Information, signal and communications theory Join cost Lattices linear dynamic models (LDM) Natural language processing perceptual listening tests Power measurement Signal processing Smoothing Smoothing methods Spectra Speech processing Speech recognition Speech synthesis Stimuli Studies Telecommunications and information theory Testing unit selection Viterbi algorithm |
title | Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T23%3A25%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Subjective%20evaluation%20of%20join%20cost%20and%20smoothing%20methods%20for%20unit%20selection%20speech%20synthesis&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Vepa,%20J.&rft.date=2006-09-01&rft.volume=14&rft.issue=5&rft.spage=1763&rft.epage=1771&rft.pages=1763-1771&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TSA.2005.858548&rft_dat=%3Cproquest_RIE%3E2568845531%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=917430710&rft_id=info:pmid/&rft_ieee_id=1677995&rfr_iscdi=true |