Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model

In continuous speech, the pitch contour of the same syllable may vary much due to its contextual information. The Parallel Encoding and Target Approximation （PENTA） model is applied here to Mandarin speech synthesis with a method to predict pitch contours for Chinese syllables with different context...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Tsinghua science and technology 2012-04, Vol.17 (2), p.218-224
Hauptverfasser:	Pang, Hui, Wu, Zhiyong, Cai, Lianhong
Format:	Artikel
Sprache:	eng
Schlagworte:	上下文信息中国并行编码建模普通话模型应用语音合成音高
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	224
container_issue	2
container_start_page	218
container_title	Tsinghua science and technology
container_volume	17
creator	Pang, Hui Wu, Zhiyong Cai, Lianhong
description	In continuous speech, the pitch contour of the same syllable may vary much due to its contextual information. The Parallel Encoding and Target Approximation （PENTA） model is applied here to Mandarin speech synthesis with a method to predict pitch contours for Chinese syllables with different contexts by combining the Classification And Regression Tree （CART） with the PENTA model to improve its prediction accuracy. CART was first used to cluster the syllables＇ normalized pitch contours according to the syllables contextual information and the distances between pitch contours. The average pitch contour was used to train the PENTA model with the average contour for each cluster. The initial pitch is required with the PENTA model to predict a continuous pitch contour. A Pitch Discontinuity Model （PDM） was used to predict the initial pitches at positions with voiceless consonants and prosodic boundaries. Initial tests on a Chinese four-syllable word corpus containing 2048 words were extended to tests with a continuous speech corpus containing 5445 sentences. The results are satisfactory in terms of the Root Mean Square Error （RMSE） comparing the predicted pitch contour with the original contour. This method can model pitch contours for Mandarin sentences with any text for speech synthesis.
doi_str_mv	10.1109/TST.2012.6180048
format	Article
fullrecord	<record><control><sourceid>wanfang_jour_cross</sourceid><recordid>TN_cdi_wanfang_journals_qhdxxb_e201202012</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cqvip_id>41787460</cqvip_id><wanfj_id>qhdxxb_e201202012</wanfj_id><sourcerecordid>qhdxxb_e201202012</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2132-2de503e7a78602c64ae551bd5cf7c10bc30ea4ad67f04bd0290558f51234fcee3</originalsourceid><addsrcrecordid>eNo9kM1PwkAQxTdGExG9e1yPHoqzH90tR0JQSUBJqOfNdjtLS3ArbQ3439sK8TIzh_fevPwIuWcwYgzGT-k6HXFgfKRYAiCTCzJgiU4irUBddjeAjoAzeU1ummYLIFSsxYDMl1WOuzJs6KpsXUGnVWir75pWnk6LMmCDdGlDbusy0DWGFoPDhh7KtqBtgXQ1e0sn9C_jllx5u2vw7ryH5ON5lk5fo8X7y3w6WUSOM8EjnmMMArXViQLulLQYxyzLY-e1Y5A5AWilzZX2ILMc-BjiOPEx40J6hyiG5PGUe7DB27Ax265v6D6afZEfj5nBHgP0o9PCSevqqmlq9OarLj9t_WMYmB6b6bCZXmrO2DrLw9lSVGGz78j8eyTTiZYKxC_8bWnz</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Pang, Hui ; Wu, Zhiyong ; Cai, Lianhong</creator><creatorcontrib>Pang, Hui ; Wu, Zhiyong ; Cai, Lianhong</creatorcontrib><description>In continuous speech, the pitch contour of the same syllable may vary much due to its contextual information. The Parallel Encoding and Target Approximation （PENTA） model is applied here to Mandarin speech synthesis with a method to predict pitch contours for Chinese syllables with different contexts by combining the Classification And Regression Tree （CART） with the PENTA model to improve its prediction accuracy. CART was first used to cluster the syllables＇ normalized pitch contours according to the syllables contextual information and the distances between pitch contours. The average pitch contour was used to train the PENTA model with the average contour for each cluster. The initial pitch is required with the PENTA model to predict a continuous pitch contour. A Pitch Discontinuity Model （PDM） was used to predict the initial pitches at positions with voiceless consonants and prosodic boundaries. Initial tests on a Chinese four-syllable word corpus containing 2048 words were extended to tests with a continuous speech corpus containing 5445 sentences. The results are satisfactory in terms of the Root Mean Square Error （RMSE） comparing the predicted pitch contour with the original contour. This method can model pitch contours for Mandarin sentences with any text for speech synthesis.</description><identifier>ISSN: 1007-0214</identifier><identifier>EISSN: 1878-7606</identifier><identifier>EISSN: 1007-0214</identifier><identifier>DOI: 10.1109/TST.2012.6180048</identifier><language>eng</language><publisher>Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems,Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China</publisher><subject>上下文信息 ; 中国 ; 并行编码 ; 建模 ; 普通话 ; 模型应用 ; 语音合成 ; 音高</subject><ispartof>Tsinghua science and technology, 2012-04, Vol.17 (2), p.218-224</ispartof><rights>Copyright © Wanfang Data Co. Ltd. All Rights Reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://image.cqvip.com/vip1000/qk/85782X/85782X.jpg</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Pang, Hui</creatorcontrib><creatorcontrib>Wu, Zhiyong</creatorcontrib><creatorcontrib>Cai, Lianhong</creatorcontrib><title>Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model</title><title>Tsinghua science and technology</title><addtitle>Tsinghua Science and Technology</addtitle><description>In continuous speech, the pitch contour of the same syllable may vary much due to its contextual information. The Parallel Encoding and Target Approximation （PENTA） model is applied here to Mandarin speech synthesis with a method to predict pitch contours for Chinese syllables with different contexts by combining the Classification And Regression Tree （CART） with the PENTA model to improve its prediction accuracy. CART was first used to cluster the syllables＇ normalized pitch contours according to the syllables contextual information and the distances between pitch contours. The average pitch contour was used to train the PENTA model with the average contour for each cluster. The initial pitch is required with the PENTA model to predict a continuous pitch contour. A Pitch Discontinuity Model （PDM） was used to predict the initial pitches at positions with voiceless consonants and prosodic boundaries. Initial tests on a Chinese four-syllable word corpus containing 2048 words were extended to tests with a continuous speech corpus containing 5445 sentences. The results are satisfactory in terms of the Root Mean Square Error （RMSE） comparing the predicted pitch contour with the original contour. This method can model pitch contours for Mandarin sentences with any text for speech synthesis.</description><subject>上下文信息</subject><subject>中国</subject><subject>并行编码</subject><subject>建模</subject><subject>普通话</subject><subject>模型应用</subject><subject>语音合成</subject><subject>音高</subject><issn>1007-0214</issn><issn>1878-7606</issn><issn>1007-0214</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNo9kM1PwkAQxTdGExG9e1yPHoqzH90tR0JQSUBJqOfNdjtLS3ArbQ3439sK8TIzh_fevPwIuWcwYgzGT-k6HXFgfKRYAiCTCzJgiU4irUBddjeAjoAzeU1ummYLIFSsxYDMl1WOuzJs6KpsXUGnVWir75pWnk6LMmCDdGlDbusy0DWGFoPDhh7KtqBtgXQ1e0sn9C_jllx5u2vw7ryH5ON5lk5fo8X7y3w6WUSOM8EjnmMMArXViQLulLQYxyzLY-e1Y5A5AWilzZX2ILMc-BjiOPEx40J6hyiG5PGUe7DB27Ax265v6D6afZEfj5nBHgP0o9PCSevqqmlq9OarLj9t_WMYmB6b6bCZXmrO2DrLw9lSVGGz78j8eyTTiZYKxC_8bWnz</recordid><startdate>201204</startdate><enddate>201204</enddate><creator>Pang, Hui</creator><creator>Wu, Zhiyong</creator><creator>Cai, Lianhong</creator><general>Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems,Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China</general><scope>2RA</scope><scope>92L</scope><scope>CQIGP</scope><scope>~WA</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>2B.</scope><scope>4A8</scope><scope>92I</scope><scope>93N</scope><scope>PSX</scope><scope>TCJ</scope></search><sort><creationdate>201204</creationdate><title>Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model</title><author>Pang, Hui ; Wu, Zhiyong ; Cai, Lianhong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2132-2de503e7a78602c64ae551bd5cf7c10bc30ea4ad67f04bd0290558f51234fcee3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>上下文信息</topic><topic>中国</topic><topic>并行编码</topic><topic>建模</topic><topic>普通话</topic><topic>模型应用</topic><topic>语音合成</topic><topic>音高</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pang, Hui</creatorcontrib><creatorcontrib>Wu, Zhiyong</creatorcontrib><creatorcontrib>Cai, Lianhong</creatorcontrib><collection>中文科技期刊数据库</collection><collection>中文科技期刊数据库-CALIS站点</collection><collection>中文科技期刊数据库-7.0平台</collection><collection>中文科技期刊数据库- 镜像站点</collection><collection>CrossRef</collection><collection>Wanfang Data Journals - Hong Kong</collection><collection>WANFANG Data Centre</collection><collection>Wanfang Data Journals</collection><collection>万方数据期刊 - 香港版</collection><collection>China Online Journals (COJ)</collection><collection>China Online Journals (COJ)</collection><jtitle>Tsinghua science and technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pang, Hui</au><au>Wu, Zhiyong</au><au>Cai, Lianhong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model</atitle><jtitle>Tsinghua science and technology</jtitle><addtitle>Tsinghua Science and Technology</addtitle><date>2012-04</date><risdate>2012</risdate><volume>17</volume><issue>2</issue><spage>218</spage><epage>224</epage><pages>218-224</pages><issn>1007-0214</issn><eissn>1878-7606</eissn><eissn>1007-0214</eissn><abstract>In continuous speech, the pitch contour of the same syllable may vary much due to its contextual information. The Parallel Encoding and Target Approximation （PENTA） model is applied here to Mandarin speech synthesis with a method to predict pitch contours for Chinese syllables with different contexts by combining the Classification And Regression Tree （CART） with the PENTA model to improve its prediction accuracy. CART was first used to cluster the syllables＇ normalized pitch contours according to the syllables contextual information and the distances between pitch contours. The average pitch contour was used to train the PENTA model with the average contour for each cluster. The initial pitch is required with the PENTA model to predict a continuous pitch contour. A Pitch Discontinuity Model （PDM） was used to predict the initial pitches at positions with voiceless consonants and prosodic boundaries. Initial tests on a Chinese four-syllable word corpus containing 2048 words were extended to tests with a continuous speech corpus containing 5445 sentences. The results are satisfactory in terms of the Root Mean Square Error （RMSE） comparing the predicted pitch contour with the original contour. This method can model pitch contours for Mandarin sentences with any text for speech synthesis.</abstract><pub>Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems,Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China</pub><doi>10.1109/TST.2012.6180048</doi><tpages>7</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1007-0214
ispartof	Tsinghua science and technology, 2012-04, Vol.17 (2), p.218-224
issn	1007-0214 1878-7606 1007-0214
language	eng
recordid	cdi_wanfang_journals_qhdxxb_e201202012
source	Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	上下文信息中国并行编码建模普通话模型应用语音合成音高
title	Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T18%3A33%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wanfang_jour_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modeling%20Pitch%20Contour%20of%20Chinese%20Mandarin%20Sentences%20with%20the%20PENTA%20Model&rft.jtitle=Tsinghua%20science%20and%20technology&rft.au=Pang,%20Hui&rft.date=2012-04&rft.volume=17&rft.issue=2&rft.spage=218&rft.epage=224&rft.pages=218-224&rft.issn=1007-0214&rft.eissn=1878-7606&rft_id=info:doi/10.1109/TST.2012.6180048&rft_dat=%3Cwanfang_jour_cross%3Eqhdxxb_e201202012%3C/wanfang_jour_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_cqvip_id=41787460&rft_wanfj_id=qhdxxb_e201202012&rfr_iscdi=true