Supervisory Data Alignment for Text-Independent Voice Conversion
We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear a...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on audio, speech, and language processing speech, and language processing, 2010-07, Vol.18 (5), p.932-943 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 943 |
---|---|
container_issue | 5 |
container_start_page | 932 |
container_title | IEEE transactions on audio, speech, and language processing |
container_volume | 18 |
creator | Jianhua Tao Meng Zhang Nurminen, Jani Jilei Tian Xia Wang |
description | We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear and nonlinear methods are derived by considering alignment accuracy and topology preservation. For the linear alignment, we consider common phoneme clusters of the source and target space as benchmarks and adapt the source data vector to the target space while maintaining the relative phonetic positions among neighborhood clusters. In order to preserve the topological structure of the source parameter space and improve the stability of conversion and the accuracy of the phonetic mapping, a supervised self-organizing learning algorithm considering phonetic restriction is proposed for iteratively improving the alignment outcome of the previous step. Both the linear and nonlinear methods can also be applied in the cross-lingual case. Evaluation results show that the proposed methods improve the performance of alignment in terms of both alignment accuracy and stability for text-independent voice conversion in intra-lingual and cross-lingual cases. |
doi_str_mv | 10.1109/TASL.2010.2041688 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_5485203</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5485203</ieee_id><sourcerecordid>753681166</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-583e970b006c118d5ba7e84a6a9911b0ccaef561f61792fd02439b72bb12af043</originalsourceid><addsrcrecordid>eNpdkEtLw0AUhQdRsFZ_gLgJuHCVeu8kM5nsLPVVKLhodTtM0htJSTNxJi323zulpQs398V3LofD2C3CCBHyx8V4PhtxCCuHFKVSZ2yAQqg4y3l6fppRXrIr71cAaSJTHLCn-aYjt629dbvo2fQmGjf1d7umto8q66IF_fbxtF1SR6GE45etS4omtt2S87Vtr9lFZRpPN8c-ZJ-vL4vJezz7eJtOxrO4THLsY6ESyjMoAGSJqJaiMBmp1EiT54gFlKWhSkisJAbH1RJ4muRFxosCuamC2yF7OPztnP3ZkO_1uvYlNY1pyW68zkQiFaKUgbz_R67sxrXBnEbgGUclAAKFB6p01ntHle5cvTZuFyC9j1TvI9X7SPUx0qC5O2hqIjrxIlWCQ5L8AY5zcPc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1027218500</pqid></control><display><type>article</type><title>Supervisory Data Alignment for Text-Independent Voice Conversion</title><source>IEEE Electronic Library (IEL)</source><creator>Jianhua Tao ; Meng Zhang ; Nurminen, Jani ; Jilei Tian ; Xia Wang</creator><creatorcontrib>Jianhua Tao ; Meng Zhang ; Nurminen, Jani ; Jilei Tian ; Xia Wang</creatorcontrib><description>We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear and nonlinear methods are derived by considering alignment accuracy and topology preservation. For the linear alignment, we consider common phoneme clusters of the source and target space as benchmarks and adapt the source data vector to the target space while maintaining the relative phonetic positions among neighborhood clusters. In order to preserve the topological structure of the source parameter space and improve the stability of conversion and the accuracy of the phonetic mapping, a supervised self-organizing learning algorithm considering phonetic restriction is proposed for iteratively improving the alignment outcome of the previous step. Both the linear and nonlinear methods can also be applied in the cross-lingual case. Evaluation results show that the proposed methods improve the performance of alignment in terms of both alignment accuracy and stability for text-independent voice conversion in intra-lingual and cross-lingual cases.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2010.2041688</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accuracy ; Alignment ; Artificial neural networks ; Clustering algorithms ; Clusters ; Constrictions ; Conversion ; Data alignment ; Iterative algorithms ; Laboratories ; Loudspeakers ; Nonlinearity ; Phonetics ; self-organized learning ; Speech ; Stability ; Studies ; supervisory phonetic restriction ; text-independent voice conversion ; Topology ; Vector quantization ; Voice</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2010-07, Vol.18 (5), p.932-943</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jul 2010</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c391t-583e970b006c118d5ba7e84a6a9911b0ccaef561f61792fd02439b72bb12af043</citedby><cites>FETCH-LOGICAL-c391t-583e970b006c118d5ba7e84a6a9911b0ccaef561f61792fd02439b72bb12af043</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5485203$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5485203$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jianhua Tao</creatorcontrib><creatorcontrib>Meng Zhang</creatorcontrib><creatorcontrib>Nurminen, Jani</creatorcontrib><creatorcontrib>Jilei Tian</creatorcontrib><creatorcontrib>Xia Wang</creatorcontrib><title>Supervisory Data Alignment for Text-Independent Voice Conversion</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear and nonlinear methods are derived by considering alignment accuracy and topology preservation. For the linear alignment, we consider common phoneme clusters of the source and target space as benchmarks and adapt the source data vector to the target space while maintaining the relative phonetic positions among neighborhood clusters. In order to preserve the topological structure of the source parameter space and improve the stability of conversion and the accuracy of the phonetic mapping, a supervised self-organizing learning algorithm considering phonetic restriction is proposed for iteratively improving the alignment outcome of the previous step. Both the linear and nonlinear methods can also be applied in the cross-lingual case. Evaluation results show that the proposed methods improve the performance of alignment in terms of both alignment accuracy and stability for text-independent voice conversion in intra-lingual and cross-lingual cases.</description><subject>Accuracy</subject><subject>Alignment</subject><subject>Artificial neural networks</subject><subject>Clustering algorithms</subject><subject>Clusters</subject><subject>Constrictions</subject><subject>Conversion</subject><subject>Data alignment</subject><subject>Iterative algorithms</subject><subject>Laboratories</subject><subject>Loudspeakers</subject><subject>Nonlinearity</subject><subject>Phonetics</subject><subject>self-organized learning</subject><subject>Speech</subject><subject>Stability</subject><subject>Studies</subject><subject>supervisory phonetic restriction</subject><subject>text-independent voice conversion</subject><subject>Topology</subject><subject>Vector quantization</subject><subject>Voice</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkEtLw0AUhQdRsFZ_gLgJuHCVeu8kM5nsLPVVKLhodTtM0htJSTNxJi323zulpQs398V3LofD2C3CCBHyx8V4PhtxCCuHFKVSZ2yAQqg4y3l6fppRXrIr71cAaSJTHLCn-aYjt629dbvo2fQmGjf1d7umto8q66IF_fbxtF1SR6GE45etS4omtt2S87Vtr9lFZRpPN8c-ZJ-vL4vJezz7eJtOxrO4THLsY6ESyjMoAGSJqJaiMBmp1EiT54gFlKWhSkisJAbH1RJ4muRFxosCuamC2yF7OPztnP3ZkO_1uvYlNY1pyW68zkQiFaKUgbz_R67sxrXBnEbgGUclAAKFB6p01ntHle5cvTZuFyC9j1TvI9X7SPUx0qC5O2hqIjrxIlWCQ5L8AY5zcPc</recordid><startdate>20100701</startdate><enddate>20100701</enddate><creator>Jianhua Tao</creator><creator>Meng Zhang</creator><creator>Nurminen, Jani</creator><creator>Jilei Tian</creator><creator>Xia Wang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20100701</creationdate><title>Supervisory Data Alignment for Text-Independent Voice Conversion</title><author>Jianhua Tao ; Meng Zhang ; Nurminen, Jani ; Jilei Tian ; Xia Wang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-583e970b006c118d5ba7e84a6a9911b0ccaef561f61792fd02439b72bb12af043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Accuracy</topic><topic>Alignment</topic><topic>Artificial neural networks</topic><topic>Clustering algorithms</topic><topic>Clusters</topic><topic>Constrictions</topic><topic>Conversion</topic><topic>Data alignment</topic><topic>Iterative algorithms</topic><topic>Laboratories</topic><topic>Loudspeakers</topic><topic>Nonlinearity</topic><topic>Phonetics</topic><topic>self-organized learning</topic><topic>Speech</topic><topic>Stability</topic><topic>Studies</topic><topic>supervisory phonetic restriction</topic><topic>text-independent voice conversion</topic><topic>Topology</topic><topic>Vector quantization</topic><topic>Voice</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jianhua Tao</creatorcontrib><creatorcontrib>Meng Zhang</creatorcontrib><creatorcontrib>Nurminen, Jani</creatorcontrib><creatorcontrib>Jilei Tian</creatorcontrib><creatorcontrib>Xia Wang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jianhua Tao</au><au>Meng Zhang</au><au>Nurminen, Jani</au><au>Jilei Tian</au><au>Xia Wang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Supervisory Data Alignment for Text-Independent Voice Conversion</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2010-07-01</date><risdate>2010</risdate><volume>18</volume><issue>5</issue><spage>932</spage><epage>943</epage><pages>932-943</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear and nonlinear methods are derived by considering alignment accuracy and topology preservation. For the linear alignment, we consider common phoneme clusters of the source and target space as benchmarks and adapt the source data vector to the target space while maintaining the relative phonetic positions among neighborhood clusters. In order to preserve the topological structure of the source parameter space and improve the stability of conversion and the accuracy of the phonetic mapping, a supervised self-organizing learning algorithm considering phonetic restriction is proposed for iteratively improving the alignment outcome of the previous step. Both the linear and nonlinear methods can also be applied in the cross-lingual case. Evaluation results show that the proposed methods improve the performance of alignment in terms of both alignment accuracy and stability for text-independent voice conversion in intra-lingual and cross-lingual cases.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TASL.2010.2041688</doi><tpages>12</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1558-7916 |
ispartof | IEEE transactions on audio, speech, and language processing, 2010-07, Vol.18 (5), p.932-943 |
issn | 1558-7916 2329-9290 1558-7924 2329-9304 |
language | eng |
recordid | cdi_ieee_primary_5485203 |
source | IEEE Electronic Library (IEL) |
subjects | Accuracy Alignment Artificial neural networks Clustering algorithms Clusters Constrictions Conversion Data alignment Iterative algorithms Laboratories Loudspeakers Nonlinearity Phonetics self-organized learning Speech Stability Studies supervisory phonetic restriction text-independent voice conversion Topology Vector quantization Voice |
title | Supervisory Data Alignment for Text-Independent Voice Conversion |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T17%3A52%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Supervisory%20Data%20Alignment%20for%20Text-Independent%20Voice%20Conversion&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Jianhua%20Tao&rft.date=2010-07-01&rft.volume=18&rft.issue=5&rft.spage=932&rft.epage=943&rft.pages=932-943&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2010.2041688&rft_dat=%3Cproquest_RIE%3E753681166%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1027218500&rft_id=info:pmid/&rft_ieee_id=5485203&rfr_iscdi=true |