Voice Conversion Based on Locally Linear Embedding

This paper presents a novel locally linear embedding (LLE)-based framework for exemplar-based spectral conversion (SC). The key feature of the proposed SC framework is that it integrates the LLE algorithm, a manifold learning method, with the conventional exemplar-based SC method. One important adva...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of Information Science and Engineering 2018-11, Vol.34 (6), p.1493-1516
Hauptverfasser:	黃信德(HSIN-TE HWANG), 吳宜樵(YI-CHIAO WU), 彭玉淮(YU-HUAI PENG), 許晉誠(CHIN-CHENG HSU), 曹昱(YU TSAO), 王新民(HSIN-MIN WANG), 王逸如(YIH-RU WANG), 陳信宏(SIN-HORNG CHEN)
Format:	Artikel
Sprache:	eng
Schlagworte:	Conversion Dictionaries Embedding Machine learning Manifolds (mathematics) Probabilistic models Speech State of the art
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1516
container_issue	6
container_start_page	1493
container_title	Journal of Information Science and Engineering
container_volume	34
creator	黃信德(HSIN-TE HWANG) 吳宜樵(YI-CHIAO WU) 彭玉淮(YU-HUAI PENG) 許晉誠(CHIN-CHENG HSU) 曹昱(YU TSAO) 王新民(HSIN-MIN WANG) 王逸如(YIH-RU WANG) 陳信宏(SIN-HORNG CHEN)
description	This paper presents a novel locally linear embedding (LLE)-based framework for exemplar-based spectral conversion (SC). The key feature of the proposed SC framework is that it integrates the LLE algorithm, a manifold learning method, with the conventional exemplar-based SC method. One important advantage of the LLE-based SC framework is that it can be applied to either one-to-one SC or many-to-one SC. For one-to-one SC, a parallel speech corpus consisting of the pre-specified source and target speakers' speeches is used to construct the paired source and target dictionaries in advance. During online conversion, the LLE-based SC method converts the source spectral features to the target like spectral features based on the paired dictionaries. On the other hand, when applied to many-to-one SC, our system is capable of converting the voice of any unseen source speaker to that of a desired target speaker, without the requirement of collecting parallel training speech utterances from them beforehand. To further improve the quality of the converted speech, the maximum likelihood parameter generation (MLPG) and global variance (GV) methods are adopted in the proposed SC systems. Experimental results demonstrate that the proposed one-to-one SC system is comparable with the state-of- the-art Gaussian mixture model (GMM)-based one-to-one SC system in terms of speech quality and speaker similarity, and the many-to-one SC system can approximate the performance of the one-to-one SC system.
doi_str_mv	10.6688/JISE.201811_34(6).0008
format	Article
fullrecord	<record><control><sourceid>proquest_airit</sourceid><recordid>TN_cdi_proquest_journals_2133830996</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><airiti_id>10162364_201811_201811050001_201811050001_1493_1516</airiti_id><sourcerecordid>2133830996</sourcerecordid><originalsourceid>FETCH-LOGICAL-a250t-d295f057c3a9bc82fd2466f3fea9640042e809750041ccfbad74750b7fba7fa23</originalsourceid><addsrcrecordid>eNpdkEtLxDAUhbNQcBz9C1Jwo4vWm2eTpQ5VZyi48LEtaR6SobbadAT_vRk6ILg658DHPZyL0AWGQggpbzbr56oggCXGDWVX4roAAHmEFhiwyAkV7ASdxrgFIIIztkDkbQjGZauh_3ZjDEOf3enobJZMPRjddT9ZHXqnx6z6aJ21oX8_Q8ded9GdH3SJXu-rl9VjXj89rFe3da4Jhym3RHEPvDRUq9ZI4i1hQnjqnVaCATDiJKiSJ4eN8a22JUupLZMtvSZ0iS7nu5_j8LVzcWq2w27sU2VDMKWSglIiUZuZ0mEMU_hj9ov3g5vDO2aB1Af_AmaKNphjQX8Bd9JY0A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2133830996</pqid></control><display><type>article</type><title>Voice Conversion Based on Locally Linear Embedding</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>黃信德(HSIN-TE HWANG) ; 吳宜樵(YI-CHIAO WU) ; 彭玉淮(YU-HUAI PENG) ; 許晉誠(CHIN-CHENG HSU) ; 曹昱(YU TSAO) ; 王新民(HSIN-MIN WANG) ; 王逸如(YIH-RU WANG) ; 陳信宏(SIN-HORNG CHEN)</creator><creatorcontrib>黃信德(HSIN-TE HWANG) ; 吳宜樵(YI-CHIAO WU) ; 彭玉淮(YU-HUAI PENG) ; 許晉誠(CHIN-CHENG HSU) ; 曹昱(YU TSAO) ; 王新民(HSIN-MIN WANG) ; 王逸如(YIH-RU WANG) ; 陳信宏(SIN-HORNG CHEN)</creatorcontrib><description>This paper presents a novel locally linear embedding (LLE)-based framework for exemplar-based spectral conversion (SC). The key feature of the proposed SC framework is that it integrates the LLE algorithm, a manifold learning method, with the conventional exemplar-based SC method. One important advantage of the LLE-based SC framework is that it can be applied to either one-to-one SC or many-to-one SC. For one-to-one SC, a parallel speech corpus consisting of the pre-specified source and target speakers' speeches is used to construct the paired source and target dictionaries in advance. During online conversion, the LLE-based SC method converts the source spectral features to the target like spectral features based on the paired dictionaries. On the other hand, when applied to many-to-one SC, our system is capable of converting the voice of any unseen source speaker to that of a desired target speaker, without the requirement of collecting parallel training speech utterances from them beforehand. To further improve the quality of the converted speech, the maximum likelihood parameter generation (MLPG) and global variance (GV) methods are adopted in the proposed SC systems. Experimental results demonstrate that the proposed one-to-one SC system is comparable with the state-of- the-art Gaussian mixture model (GMM)-based one-to-one SC system in terms of speech quality and speaker similarity, and the many-to-one SC system can approximate the performance of the one-to-one SC system.</description><identifier>ISSN: 1016-2364</identifier><identifier>DOI: 10.6688/JISE.201811_34(6).0008</identifier><language>eng</language><publisher>Taipei: 社團法人中華民國計算語言學學會</publisher><subject>Conversion ; Dictionaries ; Embedding ; Machine learning ; Manifolds (mathematics) ; Probabilistic models ; Speech ; State of the art</subject><ispartof>Journal of Information Science and Engineering, 2018-11, Vol.34 (6), p.1493-1516</ispartof><rights>Copyright Institute of Information Science, Academia Sinica Nov 2018</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>黃信德(HSIN-TE HWANG)</creatorcontrib><creatorcontrib>吳宜樵(YI-CHIAO WU)</creatorcontrib><creatorcontrib>彭玉淮(YU-HUAI PENG)</creatorcontrib><creatorcontrib>許晉誠(CHIN-CHENG HSU)</creatorcontrib><creatorcontrib>曹昱(YU TSAO)</creatorcontrib><creatorcontrib>王新民(HSIN-MIN WANG)</creatorcontrib><creatorcontrib>王逸如(YIH-RU WANG)</creatorcontrib><creatorcontrib>陳信宏(SIN-HORNG CHEN)</creatorcontrib><title>Voice Conversion Based on Locally Linear Embedding</title><title>Journal of Information Science and Engineering</title><description>This paper presents a novel locally linear embedding (LLE)-based framework for exemplar-based spectral conversion (SC). The key feature of the proposed SC framework is that it integrates the LLE algorithm, a manifold learning method, with the conventional exemplar-based SC method. One important advantage of the LLE-based SC framework is that it can be applied to either one-to-one SC or many-to-one SC. For one-to-one SC, a parallel speech corpus consisting of the pre-specified source and target speakers' speeches is used to construct the paired source and target dictionaries in advance. During online conversion, the LLE-based SC method converts the source spectral features to the target like spectral features based on the paired dictionaries. On the other hand, when applied to many-to-one SC, our system is capable of converting the voice of any unseen source speaker to that of a desired target speaker, without the requirement of collecting parallel training speech utterances from them beforehand. To further improve the quality of the converted speech, the maximum likelihood parameter generation (MLPG) and global variance (GV) methods are adopted in the proposed SC systems. Experimental results demonstrate that the proposed one-to-one SC system is comparable with the state-of- the-art Gaussian mixture model (GMM)-based one-to-one SC system in terms of speech quality and speaker similarity, and the many-to-one SC system can approximate the performance of the one-to-one SC system.</description><subject>Conversion</subject><subject>Dictionaries</subject><subject>Embedding</subject><subject>Machine learning</subject><subject>Manifolds (mathematics)</subject><subject>Probabilistic models</subject><subject>Speech</subject><subject>State of the art</subject><issn>1016-2364</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNpdkEtLxDAUhbNQcBz9C1Jwo4vWm2eTpQ5VZyi48LEtaR6SobbadAT_vRk6ILg658DHPZyL0AWGQggpbzbr56oggCXGDWVX4roAAHmEFhiwyAkV7ASdxrgFIIIztkDkbQjGZauh_3ZjDEOf3enobJZMPRjddT9ZHXqnx6z6aJ21oX8_Q8ded9GdH3SJXu-rl9VjXj89rFe3da4Jhym3RHEPvDRUq9ZI4i1hQnjqnVaCATDiJKiSJ4eN8a22JUupLZMtvSZ0iS7nu5_j8LVzcWq2w27sU2VDMKWSglIiUZuZ0mEMU_hj9ov3g5vDO2aB1Af_AmaKNphjQX8Bd9JY0A</recordid><startdate>20181101</startdate><enddate>20181101</enddate><creator>黃信德(HSIN-TE HWANG)</creator><creator>吳宜樵(YI-CHIAO WU)</creator><creator>彭玉淮(YU-HUAI PENG)</creator><creator>許晉誠(CHIN-CHENG HSU)</creator><creator>曹昱(YU TSAO)</creator><creator>王新民(HSIN-MIN WANG)</creator><creator>王逸如(YIH-RU WANG)</creator><creator>陳信宏(SIN-HORNG CHEN)</creator><general>社團法人中華民國計算語言學學會</general><general>Institute of Information Science, Academia Sinica</general><scope>188</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20181101</creationdate><title>Voice Conversion Based on Locally Linear Embedding</title><author>黃信德(HSIN-TE HWANG) ; 吳宜樵(YI-CHIAO WU) ; 彭玉淮(YU-HUAI PENG) ; 許晉誠(CHIN-CHENG HSU) ; 曹昱(YU TSAO) ; 王新民(HSIN-MIN WANG) ; 王逸如(YIH-RU WANG) ; 陳信宏(SIN-HORNG CHEN)</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a250t-d295f057c3a9bc82fd2466f3fea9640042e809750041ccfbad74750b7fba7fa23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Conversion</topic><topic>Dictionaries</topic><topic>Embedding</topic><topic>Machine learning</topic><topic>Manifolds (mathematics)</topic><topic>Probabilistic models</topic><topic>Speech</topic><topic>State of the art</topic><toplevel>online_resources</toplevel><creatorcontrib>黃信德(HSIN-TE HWANG)</creatorcontrib><creatorcontrib>吳宜樵(YI-CHIAO WU)</creatorcontrib><creatorcontrib>彭玉淮(YU-HUAI PENG)</creatorcontrib><creatorcontrib>許晉誠(CHIN-CHENG HSU)</creatorcontrib><creatorcontrib>曹昱(YU TSAO)</creatorcontrib><creatorcontrib>王新民(HSIN-MIN WANG)</creatorcontrib><creatorcontrib>王逸如(YIH-RU WANG)</creatorcontrib><creatorcontrib>陳信宏(SIN-HORNG CHEN)</creatorcontrib><collection>Airiti Library</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of Information Science and Engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>黃信德(HSIN-TE HWANG)</au><au>吳宜樵(YI-CHIAO WU)</au><au>彭玉淮(YU-HUAI PENG)</au><au>許晉誠(CHIN-CHENG HSU)</au><au>曹昱(YU TSAO)</au><au>王新民(HSIN-MIN WANG)</au><au>王逸如(YIH-RU WANG)</au><au>陳信宏(SIN-HORNG CHEN)</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Voice Conversion Based on Locally Linear Embedding</atitle><jtitle>Journal of Information Science and Engineering</jtitle><date>2018-11-01</date><risdate>2018</risdate><volume>34</volume><issue>6</issue><spage>1493</spage><epage>1516</epage><pages>1493-1516</pages><issn>1016-2364</issn><abstract>This paper presents a novel locally linear embedding (LLE)-based framework for exemplar-based spectral conversion (SC). The key feature of the proposed SC framework is that it integrates the LLE algorithm, a manifold learning method, with the conventional exemplar-based SC method. One important advantage of the LLE-based SC framework is that it can be applied to either one-to-one SC or many-to-one SC. For one-to-one SC, a parallel speech corpus consisting of the pre-specified source and target speakers' speeches is used to construct the paired source and target dictionaries in advance. During online conversion, the LLE-based SC method converts the source spectral features to the target like spectral features based on the paired dictionaries. On the other hand, when applied to many-to-one SC, our system is capable of converting the voice of any unseen source speaker to that of a desired target speaker, without the requirement of collecting parallel training speech utterances from them beforehand. To further improve the quality of the converted speech, the maximum likelihood parameter generation (MLPG) and global variance (GV) methods are adopted in the proposed SC systems. Experimental results demonstrate that the proposed one-to-one SC system is comparable with the state-of- the-art Gaussian mixture model (GMM)-based one-to-one SC system in terms of speech quality and speaker similarity, and the many-to-one SC system can approximate the performance of the one-to-one SC system.</abstract><cop>Taipei</cop><pub>社團法人中華民國計算語言學學會</pub><doi>10.6688/JISE.201811_34(6).0008</doi><tpages>24</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1016-2364
ispartof	Journal of Information Science and Engineering, 2018-11, Vol.34 (6), p.1493-1516
issn	1016-2364
language	eng
recordid	cdi_proquest_journals_2133830996
source	EZB-FREE-00999 freely available EZB journals
subjects	Conversion Dictionaries Embedding Machine learning Manifolds (mathematics) Probabilistic models Speech State of the art
title	Voice Conversion Based on Locally Linear Embedding
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T06%3A20%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_airit&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Voice%20Conversion%20Based%20on%20Locally%20Linear%20Embedding&rft.jtitle=Journal%20of%20Information%20Science%20and%20Engineering&rft.au=%E9%BB%83%E4%BF%A1%E5%BE%B7(HSIN-TE%20HWANG)&rft.date=2018-11-01&rft.volume=34&rft.issue=6&rft.spage=1493&rft.epage=1516&rft.pages=1493-1516&rft.issn=1016-2364&rft_id=info:doi/10.6688/JISE.201811_34(6).0008&rft_dat=%3Cproquest_airit%3E2133830996%3C/proquest_airit%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2133830996&rft_id=info:pmid/&rft_airiti_id=10162364_201811_201811050001_201811050001_1493_1516&rfr_iscdi=true