Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features
In this paper, an approach for polyglot speech synthesis based on cross-lingual frame selection is proposed. This method requires only mono-lingual speech data of different speakers in different languages for building a polyglot synthesis system, thus reducing the burden of data collection. Essentia...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2014-10, Vol.22 (10), p.1558-1570 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1570 |
---|---|
container_issue | 10 |
container_start_page | 1558 |
container_title | IEEE/ACM transactions on audio, speech, and language processing |
container_volume | 22 |
creator | Chen, Chia-Ping Huang, Yi-Chin Wu, Chung-Hsien Lee, Kuan-De |
description | In this paper, an approach for polyglot speech synthesis based on cross-lingual frame selection is proposed. This method requires only mono-lingual speech data of different speakers in different languages for building a polyglot synthesis system, thus reducing the burden of data collection. Essentially, a set of artificial utterances in the second language for a target speaker is constructed based on the proposed cross-lingual frame-selection process, and this data set is used to adapt a synthesis model in the second language to the speaker. In the cross-lingual frame-selection process, we propose to use auditory and articulatory features to improve the quality of the synthesized polyglot speech. For evaluation, a Mandarin-English polyglot system is implemented where the target speaker only speaks Mandarin. The results show that decent performance regarding voice identity and speech quality can be achieved with the proposed method. |
doi_str_mv | 10.1109/TASLP.2014.2339738 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_1564751919</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6857339</ieee_id><sourcerecordid>3442342291</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-d60fdbbf87b9fcd60937b657f630adecd4d454e380a712aa1e1762266f99122c3</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRS0EEhX0B2BjiXWKH4kdL0tFASkSldKuIyeetK7SpNjOIn9P-oDVvO6d0RyEniiZUUrU63qeZ6sZIzSeMc6V5OkNmjDOVKQ4iW__cqbIPZp6vyeEUCKVkvEE2VXXDNumCzg_AlQ7nA9t2IG3Hr9pDwZ3LV64zvsos-221w1eOn0AnEMDVbDjdOPHAZ73xobODVi3Bs9dsFXf6HNjCTr0Dvwjuqt142F6jQ9os3xfLz6j7PvjazHPooqpJERGkNqUZZ3KUtXVWCkuS5HIWnCiDVQmNnESA0-JlpRpTYFKwZgQtVKUsYo_oJfL3qPrfnrwodh3vWvHkwVNRCwTqqgaVeyiqk7POaiLo7MH7YaCkuJEtThTLU5UiyvV0fR8MVkA-DeINJGjgP8CF9F0Xw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1564751919</pqid></control><display><type>article</type><title>Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features</title><source>IEEE Electronic Library (IEL)</source><creator>Chen, Chia-Ping ; Huang, Yi-Chin ; Wu, Chung-Hsien ; Lee, Kuan-De</creator><creatorcontrib>Chen, Chia-Ping ; Huang, Yi-Chin ; Wu, Chung-Hsien ; Lee, Kuan-De</creatorcontrib><description>In this paper, an approach for polyglot speech synthesis based on cross-lingual frame selection is proposed. This method requires only mono-lingual speech data of different speakers in different languages for building a polyglot synthesis system, thus reducing the burden of data collection. Essentially, a set of artificial utterances in the second language for a target speaker is constructed based on the proposed cross-lingual frame-selection process, and this data set is used to adapt a synthesis model in the second language to the speaker. In the cross-lingual frame-selection process, we propose to use auditory and articulatory features to improve the quality of the synthesized polyglot speech. For evaluation, a Mandarin-English polyglot system is implemented where the target speaker only speaks Mandarin. The results show that decent performance regarding voice identity and speech quality can be achieved with the proposed method.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2014.2339738</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Adaptation models ; Articulatory features ; auditory features ; cross-lingual frame selection ; Feature extraction ; Hidden Markov models ; IEEE transactions ; polyglot speech synthesis ; Speech ; Speech synthesis</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2014-10, Vol.22 (10), p.1558-1570</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Oct 2014</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-d60fdbbf87b9fcd60937b657f630adecd4d454e380a712aa1e1762266f99122c3</citedby><cites>FETCH-LOGICAL-c295t-d60fdbbf87b9fcd60937b657f630adecd4d454e380a712aa1e1762266f99122c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6857339$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6857339$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chen, Chia-Ping</creatorcontrib><creatorcontrib>Huang, Yi-Chin</creatorcontrib><creatorcontrib>Wu, Chung-Hsien</creatorcontrib><creatorcontrib>Lee, Kuan-De</creatorcontrib><title>Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>In this paper, an approach for polyglot speech synthesis based on cross-lingual frame selection is proposed. This method requires only mono-lingual speech data of different speakers in different languages for building a polyglot synthesis system, thus reducing the burden of data collection. Essentially, a set of artificial utterances in the second language for a target speaker is constructed based on the proposed cross-lingual frame-selection process, and this data set is used to adapt a synthesis model in the second language to the speaker. In the cross-lingual frame-selection process, we propose to use auditory and articulatory features to improve the quality of the synthesized polyglot speech. For evaluation, a Mandarin-English polyglot system is implemented where the target speaker only speaks Mandarin. The results show that decent performance regarding voice identity and speech quality can be achieved with the proposed method.</description><subject>Adaptation models</subject><subject>Articulatory features</subject><subject>auditory features</subject><subject>cross-lingual frame selection</subject><subject>Feature extraction</subject><subject>Hidden Markov models</subject><subject>IEEE transactions</subject><subject>polyglot speech synthesis</subject><subject>Speech</subject><subject>Speech synthesis</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRS0EEhX0B2BjiXWKH4kdL0tFASkSldKuIyeetK7SpNjOIn9P-oDVvO6d0RyEniiZUUrU63qeZ6sZIzSeMc6V5OkNmjDOVKQ4iW__cqbIPZp6vyeEUCKVkvEE2VXXDNumCzg_AlQ7nA9t2IG3Hr9pDwZ3LV64zvsos-221w1eOn0AnEMDVbDjdOPHAZ73xobODVi3Bs9dsFXf6HNjCTr0Dvwjuqt142F6jQ9os3xfLz6j7PvjazHPooqpJERGkNqUZZ3KUtXVWCkuS5HIWnCiDVQmNnESA0-JlpRpTYFKwZgQtVKUsYo_oJfL3qPrfnrwodh3vWvHkwVNRCwTqqgaVeyiqk7POaiLo7MH7YaCkuJEtThTLU5UiyvV0fR8MVkA-DeINJGjgP8CF9F0Xw</recordid><startdate>201410</startdate><enddate>201410</enddate><creator>Chen, Chia-Ping</creator><creator>Huang, Yi-Chin</creator><creator>Wu, Chung-Hsien</creator><creator>Lee, Kuan-De</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201410</creationdate><title>Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features</title><author>Chen, Chia-Ping ; Huang, Yi-Chin ; Wu, Chung-Hsien ; Lee, Kuan-De</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-d60fdbbf87b9fcd60937b657f630adecd4d454e380a712aa1e1762266f99122c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Adaptation models</topic><topic>Articulatory features</topic><topic>auditory features</topic><topic>cross-lingual frame selection</topic><topic>Feature extraction</topic><topic>Hidden Markov models</topic><topic>IEEE transactions</topic><topic>polyglot speech synthesis</topic><topic>Speech</topic><topic>Speech synthesis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Chia-Ping</creatorcontrib><creatorcontrib>Huang, Yi-Chin</creatorcontrib><creatorcontrib>Wu, Chung-Hsien</creatorcontrib><creatorcontrib>Lee, Kuan-De</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Chia-Ping</au><au>Huang, Yi-Chin</au><au>Wu, Chung-Hsien</au><au>Lee, Kuan-De</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2014-10</date><risdate>2014</risdate><volume>22</volume><issue>10</issue><spage>1558</spage><epage>1570</epage><pages>1558-1570</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>In this paper, an approach for polyglot speech synthesis based on cross-lingual frame selection is proposed. This method requires only mono-lingual speech data of different speakers in different languages for building a polyglot synthesis system, thus reducing the burden of data collection. Essentially, a set of artificial utterances in the second language for a target speaker is constructed based on the proposed cross-lingual frame-selection process, and this data set is used to adapt a synthesis model in the second language to the speaker. In the cross-lingual frame-selection process, we propose to use auditory and articulatory features to improve the quality of the synthesized polyglot speech. For evaluation, a Mandarin-English polyglot system is implemented where the target speaker only speaks Mandarin. The results show that decent performance regarding voice identity and speech quality can be achieved with the proposed method.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2014.2339738</doi><tpages>13</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2329-9290 |
ispartof | IEEE/ACM transactions on audio, speech, and language processing, 2014-10, Vol.22 (10), p.1558-1570 |
issn | 2329-9290 2329-9304 |
language | eng |
recordid | cdi_proquest_journals_1564751919 |
source | IEEE Electronic Library (IEL) |
subjects | Adaptation models Articulatory features auditory features cross-lingual frame selection Feature extraction Hidden Markov models IEEE transactions polyglot speech synthesis Speech Speech synthesis |
title | Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T05%3A54%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Polyglot%20Speech%20Synthesis%20Based%20on%20Cross-Lingual%20Frame%20Selection%20Using%20Auditory%20and%20Articulatory%20Features&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Chen,%20Chia-Ping&rft.date=2014-10&rft.volume=22&rft.issue=10&rft.spage=1558&rft.epage=1570&rft.pages=1558-1570&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASLP.2014.2339738&rft_dat=%3Cproquest_RIE%3E3442342291%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1564751919&rft_id=info:pmid/&rft_ieee_id=6857339&rfr_iscdi=true |