Supervisory Data Alignment for Text-Independent Voice Conversion

We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2010-07, Vol.18 (5), p.932-943
Hauptverfasser: Jianhua Tao, Meng Zhang, Nurminen, Jani, Jilei Tian, Xia Wang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 943
container_issue 5
container_start_page 932
container_title IEEE transactions on audio, speech, and language processing
container_volume 18
creator Jianhua Tao
Meng Zhang
Nurminen, Jani
Jilei Tian
Xia Wang
description We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear and nonlinear methods are derived by considering alignment accuracy and topology preservation. For the linear alignment, we consider common phoneme clusters of the source and target space as benchmarks and adapt the source data vector to the target space while maintaining the relative phonetic positions among neighborhood clusters. In order to preserve the topological structure of the source parameter space and improve the stability of conversion and the accuracy of the phonetic mapping, a supervised self-organizing learning algorithm considering phonetic restriction is proposed for iteratively improving the alignment outcome of the previous step. Both the linear and nonlinear methods can also be applied in the cross-lingual case. Evaluation results show that the proposed methods improve the performance of alignment in terms of both alignment accuracy and stability for text-independent voice conversion in intra-lingual and cross-lingual cases.
doi_str_mv 10.1109/TASL.2010.2041688
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_5485203</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5485203</ieee_id><sourcerecordid>753681166</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-583e970b006c118d5ba7e84a6a9911b0ccaef561f61792fd02439b72bb12af043</originalsourceid><addsrcrecordid>eNpdkEtLw0AUhQdRsFZ_gLgJuHCVeu8kM5nsLPVVKLhodTtM0htJSTNxJi323zulpQs398V3LofD2C3CCBHyx8V4PhtxCCuHFKVSZ2yAQqg4y3l6fppRXrIr71cAaSJTHLCn-aYjt629dbvo2fQmGjf1d7umto8q66IF_fbxtF1SR6GE45etS4omtt2S87Vtr9lFZRpPN8c-ZJ-vL4vJezz7eJtOxrO4THLsY6ESyjMoAGSJqJaiMBmp1EiT54gFlKWhSkisJAbH1RJ4muRFxosCuamC2yF7OPztnP3ZkO_1uvYlNY1pyW68zkQiFaKUgbz_R67sxrXBnEbgGUclAAKFB6p01ntHle5cvTZuFyC9j1TvI9X7SPUx0qC5O2hqIjrxIlWCQ5L8AY5zcPc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1027218500</pqid></control><display><type>article</type><title>Supervisory Data Alignment for Text-Independent Voice Conversion</title><source>IEEE Electronic Library (IEL)</source><creator>Jianhua Tao ; Meng Zhang ; Nurminen, Jani ; Jilei Tian ; Xia Wang</creator><creatorcontrib>Jianhua Tao ; Meng Zhang ; Nurminen, Jani ; Jilei Tian ; Xia Wang</creatorcontrib><description>We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear and nonlinear methods are derived by considering alignment accuracy and topology preservation. For the linear alignment, we consider common phoneme clusters of the source and target space as benchmarks and adapt the source data vector to the target space while maintaining the relative phonetic positions among neighborhood clusters. In order to preserve the topological structure of the source parameter space and improve the stability of conversion and the accuracy of the phonetic mapping, a supervised self-organizing learning algorithm considering phonetic restriction is proposed for iteratively improving the alignment outcome of the previous step. Both the linear and nonlinear methods can also be applied in the cross-lingual case. Evaluation results show that the proposed methods improve the performance of alignment in terms of both alignment accuracy and stability for text-independent voice conversion in intra-lingual and cross-lingual cases.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2010.2041688</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accuracy ; Alignment ; Artificial neural networks ; Clustering algorithms ; Clusters ; Constrictions ; Conversion ; Data alignment ; Iterative algorithms ; Laboratories ; Loudspeakers ; Nonlinearity ; Phonetics ; self-organized learning ; Speech ; Stability ; Studies ; supervisory phonetic restriction ; text-independent voice conversion ; Topology ; Vector quantization ; Voice</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2010-07, Vol.18 (5), p.932-943</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jul 2010</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c391t-583e970b006c118d5ba7e84a6a9911b0ccaef561f61792fd02439b72bb12af043</citedby><cites>FETCH-LOGICAL-c391t-583e970b006c118d5ba7e84a6a9911b0ccaef561f61792fd02439b72bb12af043</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5485203$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5485203$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jianhua Tao</creatorcontrib><creatorcontrib>Meng Zhang</creatorcontrib><creatorcontrib>Nurminen, Jani</creatorcontrib><creatorcontrib>Jilei Tian</creatorcontrib><creatorcontrib>Xia Wang</creatorcontrib><title>Supervisory Data Alignment for Text-Independent Voice Conversion</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear and nonlinear methods are derived by considering alignment accuracy and topology preservation. For the linear alignment, we consider common phoneme clusters of the source and target space as benchmarks and adapt the source data vector to the target space while maintaining the relative phonetic positions among neighborhood clusters. In order to preserve the topological structure of the source parameter space and improve the stability of conversion and the accuracy of the phonetic mapping, a supervised self-organizing learning algorithm considering phonetic restriction is proposed for iteratively improving the alignment outcome of the previous step. Both the linear and nonlinear methods can also be applied in the cross-lingual case. Evaluation results show that the proposed methods improve the performance of alignment in terms of both alignment accuracy and stability for text-independent voice conversion in intra-lingual and cross-lingual cases.</description><subject>Accuracy</subject><subject>Alignment</subject><subject>Artificial neural networks</subject><subject>Clustering algorithms</subject><subject>Clusters</subject><subject>Constrictions</subject><subject>Conversion</subject><subject>Data alignment</subject><subject>Iterative algorithms</subject><subject>Laboratories</subject><subject>Loudspeakers</subject><subject>Nonlinearity</subject><subject>Phonetics</subject><subject>self-organized learning</subject><subject>Speech</subject><subject>Stability</subject><subject>Studies</subject><subject>supervisory phonetic restriction</subject><subject>text-independent voice conversion</subject><subject>Topology</subject><subject>Vector quantization</subject><subject>Voice</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkEtLw0AUhQdRsFZ_gLgJuHCVeu8kM5nsLPVVKLhodTtM0htJSTNxJi323zulpQs398V3LofD2C3CCBHyx8V4PhtxCCuHFKVSZ2yAQqg4y3l6fppRXrIr71cAaSJTHLCn-aYjt629dbvo2fQmGjf1d7umto8q66IF_fbxtF1SR6GE45etS4omtt2S87Vtr9lFZRpPN8c-ZJ-vL4vJezz7eJtOxrO4THLsY6ESyjMoAGSJqJaiMBmp1EiT54gFlKWhSkisJAbH1RJ4muRFxosCuamC2yF7OPztnP3ZkO_1uvYlNY1pyW68zkQiFaKUgbz_R67sxrXBnEbgGUclAAKFB6p01ntHle5cvTZuFyC9j1TvI9X7SPUx0qC5O2hqIjrxIlWCQ5L8AY5zcPc</recordid><startdate>20100701</startdate><enddate>20100701</enddate><creator>Jianhua Tao</creator><creator>Meng Zhang</creator><creator>Nurminen, Jani</creator><creator>Jilei Tian</creator><creator>Xia Wang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20100701</creationdate><title>Supervisory Data Alignment for Text-Independent Voice Conversion</title><author>Jianhua Tao ; Meng Zhang ; Nurminen, Jani ; Jilei Tian ; Xia Wang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-583e970b006c118d5ba7e84a6a9911b0ccaef561f61792fd02439b72bb12af043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Accuracy</topic><topic>Alignment</topic><topic>Artificial neural networks</topic><topic>Clustering algorithms</topic><topic>Clusters</topic><topic>Constrictions</topic><topic>Conversion</topic><topic>Data alignment</topic><topic>Iterative algorithms</topic><topic>Laboratories</topic><topic>Loudspeakers</topic><topic>Nonlinearity</topic><topic>Phonetics</topic><topic>self-organized learning</topic><topic>Speech</topic><topic>Stability</topic><topic>Studies</topic><topic>supervisory phonetic restriction</topic><topic>text-independent voice conversion</topic><topic>Topology</topic><topic>Vector quantization</topic><topic>Voice</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jianhua Tao</creatorcontrib><creatorcontrib>Meng Zhang</creatorcontrib><creatorcontrib>Nurminen, Jani</creatorcontrib><creatorcontrib>Jilei Tian</creatorcontrib><creatorcontrib>Xia Wang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jianhua Tao</au><au>Meng Zhang</au><au>Nurminen, Jani</au><au>Jilei Tian</au><au>Xia Wang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Supervisory Data Alignment for Text-Independent Voice Conversion</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2010-07-01</date><risdate>2010</risdate><volume>18</volume><issue>5</issue><spage>932</spage><epage>943</epage><pages>932-943</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear and nonlinear methods are derived by considering alignment accuracy and topology preservation. For the linear alignment, we consider common phoneme clusters of the source and target space as benchmarks and adapt the source data vector to the target space while maintaining the relative phonetic positions among neighborhood clusters. In order to preserve the topological structure of the source parameter space and improve the stability of conversion and the accuracy of the phonetic mapping, a supervised self-organizing learning algorithm considering phonetic restriction is proposed for iteratively improving the alignment outcome of the previous step. Both the linear and nonlinear methods can also be applied in the cross-lingual case. Evaluation results show that the proposed methods improve the performance of alignment in terms of both alignment accuracy and stability for text-independent voice conversion in intra-lingual and cross-lingual cases.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TASL.2010.2041688</doi><tpages>12</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2010-07, Vol.18 (5), p.932-943
issn 1558-7916
2329-9290
1558-7924
2329-9304
language eng
recordid cdi_ieee_primary_5485203
source IEEE Electronic Library (IEL)
subjects Accuracy
Alignment
Artificial neural networks
Clustering algorithms
Clusters
Constrictions
Conversion
Data alignment
Iterative algorithms
Laboratories
Loudspeakers
Nonlinearity
Phonetics
self-organized learning
Speech
Stability
Studies
supervisory phonetic restriction
text-independent voice conversion
Topology
Vector quantization
Voice
title Supervisory Data Alignment for Text-Independent Voice Conversion
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T17%3A52%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Supervisory%20Data%20Alignment%20for%20Text-Independent%20Voice%20Conversion&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Jianhua%20Tao&rft.date=2010-07-01&rft.volume=18&rft.issue=5&rft.spage=932&rft.epage=943&rft.pages=932-943&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2010.2041688&rft_dat=%3Cproquest_RIE%3E753681166%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1027218500&rft_id=info:pmid/&rft_ieee_id=5485203&rfr_iscdi=true