Listening test materials for "A study of speaker adaptation for DNN-based speech synthesis"

The dataset contains the testing stimuli and listeners' MUSHRA test responses for the Interspeech 2015 paper, "A study of speaker adaptation for DNN-based speech synthesis". In this paper, we conduct an experimental analysis of speaker adaptation for Deep Neural Network (DNN) based sp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Wu, Zhizheng
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Wu, Zhizheng
description The dataset contains the testing stimuli and listeners' MUSHRA test responses for the Interspeech 2015 paper, "A study of speaker adaptation for DNN-based speech synthesis". In this paper, we conduct an experimental analysis of speaker adaptation for Deep Neural Network (DNN) based speech synthesis at different levels. In particular, we augment a low-dimensional speaker-specific vector with linguistic features as input to represent speaker identity, perform model adaptation to scale the hidden activation weights, and perform a feature space transformation at the output layer to modify generated acoustic features. We systematically analyse the performance of each individual adaptation technique and that of their combinations. Experimental results confirm the adaptability of the DNN, and listening tests demonstrate that the DNN can achieve significantly better adaptation performance than the hidden Markov model (HMM) baseline in terms of naturalness and speaker similarity.
doi_str_mv 10.7488/ds/259
format Dataset
fullrecord <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_7488_ds_259</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_7488_ds_259</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_7488_ds_2593</originalsourceid><addsrcrecordid>eNqNzrsOgkAQheFtLIyXZ5hQ2CHgJWJpvMTCUNlZbEZ2kI2ykJ2x4O0F4wNYnebLya_UNInnm1WaRoajxXo7VLeLZSFn3QOEWKBCIW_xxVDUHoIdsLxNC3UB3BA-yQMabATF1u5LDlkW3pHJ9IDyErh1UhJbDsZqUHRPNPntSM1Ox-v-HBoUzK2Qbryt0Lc6iXVfpQ3rrmr5N_wANFFC0Q</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Listening test materials for "A study of speaker adaptation for DNN-based speech synthesis"</title><source>DataCite</source><creator>Wu, Zhizheng</creator><creatorcontrib>Wu, Zhizheng</creatorcontrib><description>The dataset contains the testing stimuli and listeners' MUSHRA test responses for the Interspeech 2015 paper, "A study of speaker adaptation for DNN-based speech synthesis". In this paper, we conduct an experimental analysis of speaker adaptation for Deep Neural Network (DNN) based speech synthesis at different levels. In particular, we augment a low-dimensional speaker-specific vector with linguistic features as input to represent speaker identity, perform model adaptation to scale the hidden activation weights, and perform a feature space transformation at the output layer to modify generated acoustic features. We systematically analyse the performance of each individual adaptation technique and that of their combinations. Experimental results confirm the adaptability of the DNN, and listening tests demonstrate that the DNN can achieve significantly better adaptation performance than the hidden Markov model (HMM) baseline in terms of naturalness and speaker similarity.</description><identifier>DOI: 10.7488/ds/259</identifier><language>eng</language><publisher>University of Edinburgh. The Centre for Speech Technology Research (CSTR)</publisher><subject>acoustic model ; adaptation ; DNN adaptation ; speech synthesis</subject><creationdate>2015</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,1887</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.7488/ds/259$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Wu, Zhizheng</creatorcontrib><title>Listening test materials for "A study of speaker adaptation for DNN-based speech synthesis"</title><description>The dataset contains the testing stimuli and listeners' MUSHRA test responses for the Interspeech 2015 paper, "A study of speaker adaptation for DNN-based speech synthesis". In this paper, we conduct an experimental analysis of speaker adaptation for Deep Neural Network (DNN) based speech synthesis at different levels. In particular, we augment a low-dimensional speaker-specific vector with linguistic features as input to represent speaker identity, perform model adaptation to scale the hidden activation weights, and perform a feature space transformation at the output layer to modify generated acoustic features. We systematically analyse the performance of each individual adaptation technique and that of their combinations. Experimental results confirm the adaptability of the DNN, and listening tests demonstrate that the DNN can achieve significantly better adaptation performance than the hidden Markov model (HMM) baseline in terms of naturalness and speaker similarity.</description><subject>acoustic model</subject><subject>adaptation</subject><subject>DNN adaptation</subject><subject>speech synthesis</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2015</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNqNzrsOgkAQheFtLIyXZ5hQ2CHgJWJpvMTCUNlZbEZ2kI2ykJ2x4O0F4wNYnebLya_UNInnm1WaRoajxXo7VLeLZSFn3QOEWKBCIW_xxVDUHoIdsLxNC3UB3BA-yQMabATF1u5LDlkW3pHJ9IDyErh1UhJbDsZqUHRPNPntSM1Ox-v-HBoUzK2Qbryt0Lc6iXVfpQ3rrmr5N_wANFFC0Q</recordid><startdate>2015</startdate><enddate>2015</enddate><creator>Wu, Zhizheng</creator><general>University of Edinburgh. The Centre for Speech Technology Research (CSTR)</general><scope>DYCCY</scope><scope>PQ8</scope></search><sort><creationdate>2015</creationdate><title>Listening test materials for "A study of speaker adaptation for DNN-based speech synthesis"</title><author>Wu, Zhizheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_7488_ds_2593</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2015</creationdate><topic>acoustic model</topic><topic>adaptation</topic><topic>DNN adaptation</topic><topic>speech synthesis</topic><toplevel>online_resources</toplevel><creatorcontrib>Wu, Zhizheng</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Zhizheng</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Listening test materials for "A study of speaker adaptation for DNN-based speech synthesis"</title><date>2015</date><risdate>2015</risdate><abstract>The dataset contains the testing stimuli and listeners' MUSHRA test responses for the Interspeech 2015 paper, "A study of speaker adaptation for DNN-based speech synthesis". In this paper, we conduct an experimental analysis of speaker adaptation for Deep Neural Network (DNN) based speech synthesis at different levels. In particular, we augment a low-dimensional speaker-specific vector with linguistic features as input to represent speaker identity, perform model adaptation to scale the hidden activation weights, and perform a feature space transformation at the output layer to modify generated acoustic features. We systematically analyse the performance of each individual adaptation technique and that of their combinations. Experimental results confirm the adaptability of the DNN, and listening tests demonstrate that the DNN can achieve significantly better adaptation performance than the hidden Markov model (HMM) baseline in terms of naturalness and speaker similarity.</abstract><pub>University of Edinburgh. The Centre for Speech Technology Research (CSTR)</pub><doi>10.7488/ds/259</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.7488/ds/259
ispartof
issn
language eng
recordid cdi_datacite_primary_10_7488_ds_259
source DataCite
subjects acoustic model
adaptation
DNN adaptation
speech synthesis
title Listening test materials for "A study of speaker adaptation for DNN-based speech synthesis"
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T14%3A00%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Wu,%20Zhizheng&rft.date=2015&rft_id=info:doi/10.7488/ds/259&rft_dat=%3Cdatacite_PQ8%3E10_7488_ds_259%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true