Voice Conversion Based on Weighted Frequency Warping

Any modification applied to speech signals has an impact on their perceptual quality. In particular, voice conversion to modify a source voice so that it is perceived as a specific target voice involves prosodic and spectral transformations that produce significant quality degradation. Choosing amon...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2010-07, Vol.18 (5), p.922-931
Hauptverfasser:	Erro, Daniel, Moreno, Asunción, Bonafonte, Antonio
Format:	Artikel
Sprache:	eng
Schlagworte:	Conversion Degradation Frequency conversion Frequency synthesizers Gaussian mixture models (GMMs) harmonic plus stochastic model (HSM) Loudspeakers Piecewise linear techniques Power harmonic filters Probability theory Similarity Spatial databases Speech Speech analysis Speech synthesis Stochastic processes Studies Transformations Voice voice conversion Warpage Warping weighted frequency warping
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	931
container_issue	5
container_start_page	922
container_title	IEEE transactions on audio, speech, and language processing
container_volume	18
creator	Erro, Daniel Moreno, Asunción Bonafonte, Antonio
description	Any modification applied to speech signals has an impact on their perceptual quality. In particular, voice conversion to modify a source voice so that it is perceived as a specific target voice involves prosodic and spectral transformations that produce significant quality degradation. Choosing among the current voice conversion methods represents a trade-off between the similarity of the converted voice to the target voice and the quality of the resulting converted speech, both rated by listeners. This paper presents a new voice conversion method termed Weighted Frequency Warping that has a good balance between similarity and quality. This method uses a time-varying piecewise-linear frequency warping function and an energy correction filter, and it combines typical probabilistic techniques and frequency warping transformations. Compared to standard probabilistic systems, Weighted Frequency Warping results in a significant increase in quality scores, whereas the conversion scores remain almost unaltered. This paper carefully discusses the theoretical aspects of the method and the details of its implementation, and the results of an international evaluation of the new system are also included.
doi_str_mv	10.1109/TASL.2009.2038663
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_753680556</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5353707</ieee_id><sourcerecordid>2717115101</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-1878b936bac817ae58c9e833e1bc47065a8a86cb04987bbbbb333ee6ae112cd53</originalsourceid><addsrcrecordid>eNpdkE9LAzEQxYMoWKsfQLwsePC0NbP5u8darAoFD1Z7DNl0WlPa3ZpshX57s7T04BxmHsxvhscj5BboAICWj9Phx2RQUFqmxrSU7Iz0QAidq7Lg5ycN8pJcxbiilDPJoUf4V-MdZqOm_sUQfVNnTzbiPEtihn753SY9Dvizw9rts5kNW18vr8nFwq4j3hxnn3yOn6ej13zy_vI2Gk5yx0poc9BKVyWTlXUalEWhXYmaMYTKcUWlsNpq6SrKS62qrlhaorQIULi5YH3ycPi7DU1yEFuz8dHhem1rbHbRKMGkpkLIRN7_I1fNLtTJnAFaqAI0lzpRcKBcaGIMuDDb4Dc27BNkuhhNF6PpYjTHGNPN3eHGI-KJF0wwRRX7A0eDbMw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1027218468</pqid></control><display><type>article</type><title>Voice Conversion Based on Weighted Frequency Warping</title><source>IEEE Electronic Library (IEL)</source><creator>Erro, Daniel ; Moreno, Asunción ; Bonafonte, Antonio</creator><creatorcontrib>Erro, Daniel ; Moreno, Asunción ; Bonafonte, Antonio</creatorcontrib><description>Any modification applied to speech signals has an impact on their perceptual quality. In particular, voice conversion to modify a source voice so that it is perceived as a specific target voice involves prosodic and spectral transformations that produce significant quality degradation. Choosing among the current voice conversion methods represents a trade-off between the similarity of the converted voice to the target voice and the quality of the resulting converted speech, both rated by listeners. This paper presents a new voice conversion method termed Weighted Frequency Warping that has a good balance between similarity and quality. This method uses a time-varying piecewise-linear frequency warping function and an energy correction filter, and it combines typical probabilistic techniques and frequency warping transformations. Compared to standard probabilistic systems, Weighted Frequency Warping results in a significant increase in quality scores, whereas the conversion scores remain almost unaltered. This paper carefully discusses the theoretical aspects of the method and the details of its implementation, and the results of an international evaluation of the new system are also included.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2009.2038663</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Conversion ; Degradation ; Frequency conversion ; Frequency synthesizers ; Gaussian mixture models (GMMs) ; harmonic plus stochastic model (HSM) ; Loudspeakers ; Piecewise linear techniques ; Power harmonic filters ; Probability theory ; Similarity ; Spatial databases ; Speech ; Speech analysis ; Speech synthesis ; Stochastic processes ; Studies ; Transformations ; Voice ; voice conversion ; Warpage ; Warping ; weighted frequency warping</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2010-07, Vol.18 (5), p.922-931</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jul 2010</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c391t-1878b936bac817ae58c9e833e1bc47065a8a86cb04987bbbbb333ee6ae112cd53</citedby><cites>FETCH-LOGICAL-c391t-1878b936bac817ae58c9e833e1bc47065a8a86cb04987bbbbb333ee6ae112cd53</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5353707$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5353707$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Erro, Daniel</creatorcontrib><creatorcontrib>Moreno, Asunción</creatorcontrib><creatorcontrib>Bonafonte, Antonio</creatorcontrib><title>Voice Conversion Based on Weighted Frequency Warping</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>Any modification applied to speech signals has an impact on their perceptual quality. In particular, voice conversion to modify a source voice so that it is perceived as a specific target voice involves prosodic and spectral transformations that produce significant quality degradation. Choosing among the current voice conversion methods represents a trade-off between the similarity of the converted voice to the target voice and the quality of the resulting converted speech, both rated by listeners. This paper presents a new voice conversion method termed Weighted Frequency Warping that has a good balance between similarity and quality. This method uses a time-varying piecewise-linear frequency warping function and an energy correction filter, and it combines typical probabilistic techniques and frequency warping transformations. Compared to standard probabilistic systems, Weighted Frequency Warping results in a significant increase in quality scores, whereas the conversion scores remain almost unaltered. This paper carefully discusses the theoretical aspects of the method and the details of its implementation, and the results of an international evaluation of the new system are also included.</description><subject>Conversion</subject><subject>Degradation</subject><subject>Frequency conversion</subject><subject>Frequency synthesizers</subject><subject>Gaussian mixture models (GMMs)</subject><subject>harmonic plus stochastic model (HSM)</subject><subject>Loudspeakers</subject><subject>Piecewise linear techniques</subject><subject>Power harmonic filters</subject><subject>Probability theory</subject><subject>Similarity</subject><subject>Spatial databases</subject><subject>Speech</subject><subject>Speech analysis</subject><subject>Speech synthesis</subject><subject>Stochastic processes</subject><subject>Studies</subject><subject>Transformations</subject><subject>Voice</subject><subject>voice conversion</subject><subject>Warpage</subject><subject>Warping</subject><subject>weighted frequency warping</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE9LAzEQxYMoWKsfQLwsePC0NbP5u8darAoFD1Z7DNl0WlPa3ZpshX57s7T04BxmHsxvhscj5BboAICWj9Phx2RQUFqmxrSU7Iz0QAidq7Lg5ycN8pJcxbiilDPJoUf4V-MdZqOm_sUQfVNnTzbiPEtihn753SY9Dvizw9rts5kNW18vr8nFwq4j3hxnn3yOn6ej13zy_vI2Gk5yx0poc9BKVyWTlXUalEWhXYmaMYTKcUWlsNpq6SrKS62qrlhaorQIULi5YH3ycPi7DU1yEFuz8dHhem1rbHbRKMGkpkLIRN7_I1fNLtTJnAFaqAI0lzpRcKBcaGIMuDDb4Dc27BNkuhhNF6PpYjTHGNPN3eHGI-KJF0wwRRX7A0eDbMw</recordid><startdate>20100701</startdate><enddate>20100701</enddate><creator>Erro, Daniel</creator><creator>Moreno, Asunción</creator><creator>Bonafonte, Antonio</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20100701</creationdate><title>Voice Conversion Based on Weighted Frequency Warping</title><author>Erro, Daniel ; Moreno, Asunción ; Bonafonte, Antonio</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-1878b936bac817ae58c9e833e1bc47065a8a86cb04987bbbbb333ee6ae112cd53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Conversion</topic><topic>Degradation</topic><topic>Frequency conversion</topic><topic>Frequency synthesizers</topic><topic>Gaussian mixture models (GMMs)</topic><topic>harmonic plus stochastic model (HSM)</topic><topic>Loudspeakers</topic><topic>Piecewise linear techniques</topic><topic>Power harmonic filters</topic><topic>Probability theory</topic><topic>Similarity</topic><topic>Spatial databases</topic><topic>Speech</topic><topic>Speech analysis</topic><topic>Speech synthesis</topic><topic>Stochastic processes</topic><topic>Studies</topic><topic>Transformations</topic><topic>Voice</topic><topic>voice conversion</topic><topic>Warpage</topic><topic>Warping</topic><topic>weighted frequency warping</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Erro, Daniel</creatorcontrib><creatorcontrib>Moreno, Asunción</creatorcontrib><creatorcontrib>Bonafonte, Antonio</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Erro, Daniel</au><au>Moreno, Asunción</au><au>Bonafonte, Antonio</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Voice Conversion Based on Weighted Frequency Warping</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2010-07-01</date><risdate>2010</risdate><volume>18</volume><issue>5</issue><spage>922</spage><epage>931</epage><pages>922-931</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>Any modification applied to speech signals has an impact on their perceptual quality. In particular, voice conversion to modify a source voice so that it is perceived as a specific target voice involves prosodic and spectral transformations that produce significant quality degradation. Choosing among the current voice conversion methods represents a trade-off between the similarity of the converted voice to the target voice and the quality of the resulting converted speech, both rated by listeners. This paper presents a new voice conversion method termed Weighted Frequency Warping that has a good balance between similarity and quality. This method uses a time-varying piecewise-linear frequency warping function and an energy correction filter, and it combines typical probabilistic techniques and frequency warping transformations. Compared to standard probabilistic systems, Weighted Frequency Warping results in a significant increase in quality scores, whereas the conversion scores remain almost unaltered. This paper carefully discusses the theoretical aspects of the method and the details of its implementation, and the results of an international evaluation of the new system are also included.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TASL.2009.2038663</doi><tpages>10</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1558-7916
ispartof	IEEE transactions on audio, speech, and language processing, 2010-07, Vol.18 (5), p.922-931
issn	1558-7916 2329-9290 1558-7924 2329-9304
language	eng
recordid	cdi_proquest_miscellaneous_753680556
source	IEEE Electronic Library (IEL)
subjects	Conversion Degradation Frequency conversion Frequency synthesizers Gaussian mixture models (GMMs) harmonic plus stochastic model (HSM) Loudspeakers Piecewise linear techniques Power harmonic filters Probability theory Similarity Spatial databases Speech Speech analysis Speech synthesis Stochastic processes Studies Transformations Voice voice conversion Warpage Warping weighted frequency warping
title	Voice Conversion Based on Weighted Frequency Warping
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T05%3A47%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Voice%20Conversion%20Based%20on%20Weighted%20Frequency%20Warping&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Erro,%20Daniel&rft.date=2010-07-01&rft.volume=18&rft.issue=5&rft.spage=922&rft.epage=931&rft.pages=922-931&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2009.2038663&rft_dat=%3Cproquest_RIE%3E2717115101%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1027218468&rft_id=info:pmid/&rft_ieee_id=5353707&rfr_iscdi=true