Text-to-speech synthesis system with Arabic diacritic recognition system

•We developed an Arabic text-to-speech system, including a diacritization system.•The speech synthesis system is based on statistical parametric.•We address the accuracy of diacritic and acoustic models.•We proposed a diacritization system based on the position of the current letter.•Neural network...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer speech & language 2015-11, Vol.34 (1), p.43-60
Hauptverfasser:	Rebai, Ilyes, BenAyed, Yassine
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Acoustics Deep neural networks Diacritization system Mathematical models Natural language processing Neural networks Speech Speech recognition Statistical parametric Synthesis Text-to-speech synthesis Texts
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	60
container_issue	1
container_start_page	43
container_title	Computer speech & language
container_volume	34
creator	Rebai, Ilyes BenAyed, Yassine
description	•We developed an Arabic text-to-speech system, including a diacritization system.•The speech synthesis system is based on statistical parametric.•We address the accuracy of diacritic and acoustic models.•We proposed a diacritization system based on the position of the current letter.•Neural network per unit type based synthesis system generates high speech quality. Text-to-speech synthesis system has been widely studied for many languages. However, speech synthesis for Arabic language has not sufficient progresses and it is still in its first stage. Statistical parametric synthesis based on hidden Markov models was the most commonly applied approach for Arabic language. Recently, synthesized speech quality based on deep neural networks was found as intelligible as human voice. This paper describes a Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients. Deep neural networks achieved state-of-the-art performance in a wide range of tasks, including speech synthesis. Our TTS system includes a diacritization system which is very important for Arabic TTS application. Our diacritization system is also based on deep neural networks. In addition to the use deep techniques, different methods were also proposed to model the acoustic parameters in order to address the problem of acoustic models accuracy. They are based on linguistic and acoustic characteristics (e.g. letter position based diacritization system, unit types based synthesis system, diacritic marks based synthesis system) and based on deep learning techniques (stacked generalization techniques). Experimental results show that our diacritization system can generate a diacritized text with high accuracy. As regards the speech synthesis system, the experimental results and subjective evaluation show that our proposed method for synthesis system can generate intelligible and natural speech.
doi_str_mv	10.1016/j.csl.2015.04.002
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1709736591</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0885230815000418</els_id><sourcerecordid>1709736591</sourcerecordid><originalsourceid>FETCH-LOGICAL-c330t-c201c42027b318c5a469fc4263cb426701aa34d5693b2a09f23406f69cf2932a3</originalsourceid><addsrcrecordid>eNp9kDFPwzAQhS0EEqXwA9gysiSc7diJxVRVQJEqsZTZcpwLdZUmxXaB_ntctTPL3dPpvdPdR8g9hYIClY-bwoa-YEBFAWUBwC7IhIISec0lvyQTqGuRMw71NbkJYQMAUpTVhCxW-BvzOOZhh2jXWTgMcY3BhaRCxG324-I6m3nTOJu1zljvYlIe7fg5JDkOZ-MtuepMH_Du3Kfk4-V5NV_ky_fXt_lsmVvOIeY2nWhLBqxqOK2tMKVUXRpIbptUK6DG8LIVUvGGGVAd4yXITirbMcWZ4VPycNq78-PXHkPUWxcs9r0ZcNwHTStQFZdC0WSlJ6v1YwgeO73zbmv8QVPQR2p6oxM1faSmodSJWso8nTKYfvh26HWwDgeLrUs_R92O7p_0H2hodJE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1709736591</pqid></control><display><type>article</type><title>Text-to-speech synthesis system with Arabic diacritic recognition system</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Rebai, Ilyes ; BenAyed, Yassine</creator><creatorcontrib>Rebai, Ilyes ; BenAyed, Yassine</creatorcontrib><description>•We developed an Arabic text-to-speech system, including a diacritization system.•The speech synthesis system is based on statistical parametric.•We address the accuracy of diacritic and acoustic models.•We proposed a diacritization system based on the position of the current letter.•Neural network per unit type based synthesis system generates high speech quality. Text-to-speech synthesis system has been widely studied for many languages. However, speech synthesis for Arabic language has not sufficient progresses and it is still in its first stage. Statistical parametric synthesis based on hidden Markov models was the most commonly applied approach for Arabic language. Recently, synthesized speech quality based on deep neural networks was found as intelligible as human voice. This paper describes a Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients. Deep neural networks achieved state-of-the-art performance in a wide range of tasks, including speech synthesis. Our TTS system includes a diacritization system which is very important for Arabic TTS application. Our diacritization system is also based on deep neural networks. In addition to the use deep techniques, different methods were also proposed to model the acoustic parameters in order to address the problem of acoustic models accuracy. They are based on linguistic and acoustic characteristics (e.g. letter position based diacritization system, unit types based synthesis system, diacritic marks based synthesis system) and based on deep learning techniques (stacked generalization techniques). Experimental results show that our diacritization system can generate a diacritized text with high accuracy. As regards the speech synthesis system, the experimental results and subjective evaluation show that our proposed method for synthesis system can generate intelligible and natural speech.</description><identifier>ISSN: 0885-2308</identifier><identifier>EISSN: 1095-8363</identifier><identifier>DOI: 10.1016/j.csl.2015.04.002</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Accuracy ; Acoustics ; Deep neural networks ; Diacritization system ; Mathematical models ; Natural language processing ; Neural networks ; Speech ; Speech recognition ; Statistical parametric ; Synthesis ; Text-to-speech synthesis ; Texts</subject><ispartof>Computer speech & language, 2015-11, Vol.34 (1), p.43-60</ispartof><rights>2015 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c330t-c201c42027b318c5a469fc4263cb426701aa34d5693b2a09f23406f69cf2932a3</citedby><cites>FETCH-LOGICAL-c330t-c201c42027b318c5a469fc4263cb426701aa34d5693b2a09f23406f69cf2932a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.csl.2015.04.002$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,777,781,3537,27905,27906,45976</link.rule.ids></links><search><creatorcontrib>Rebai, Ilyes</creatorcontrib><creatorcontrib>BenAyed, Yassine</creatorcontrib><title>Text-to-speech synthesis system with Arabic diacritic recognition system</title><title>Computer speech & language</title><description>•We developed an Arabic text-to-speech system, including a diacritization system.•The speech synthesis system is based on statistical parametric.•We address the accuracy of diacritic and acoustic models.•We proposed a diacritization system based on the position of the current letter.•Neural network per unit type based synthesis system generates high speech quality. Text-to-speech synthesis system has been widely studied for many languages. However, speech synthesis for Arabic language has not sufficient progresses and it is still in its first stage. Statistical parametric synthesis based on hidden Markov models was the most commonly applied approach for Arabic language. Recently, synthesized speech quality based on deep neural networks was found as intelligible as human voice. This paper describes a Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients. Deep neural networks achieved state-of-the-art performance in a wide range of tasks, including speech synthesis. Our TTS system includes a diacritization system which is very important for Arabic TTS application. Our diacritization system is also based on deep neural networks. In addition to the use deep techniques, different methods were also proposed to model the acoustic parameters in order to address the problem of acoustic models accuracy. They are based on linguistic and acoustic characteristics (e.g. letter position based diacritization system, unit types based synthesis system, diacritic marks based synthesis system) and based on deep learning techniques (stacked generalization techniques). Experimental results show that our diacritization system can generate a diacritized text with high accuracy. As regards the speech synthesis system, the experimental results and subjective evaluation show that our proposed method for synthesis system can generate intelligible and natural speech.</description><subject>Accuracy</subject><subject>Acoustics</subject><subject>Deep neural networks</subject><subject>Diacritization system</subject><subject>Mathematical models</subject><subject>Natural language processing</subject><subject>Neural networks</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Statistical parametric</subject><subject>Synthesis</subject><subject>Text-to-speech synthesis</subject><subject>Texts</subject><issn>0885-2308</issn><issn>1095-8363</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNp9kDFPwzAQhS0EEqXwA9gysiSc7diJxVRVQJEqsZTZcpwLdZUmxXaB_ntctTPL3dPpvdPdR8g9hYIClY-bwoa-YEBFAWUBwC7IhIISec0lvyQTqGuRMw71NbkJYQMAUpTVhCxW-BvzOOZhh2jXWTgMcY3BhaRCxG324-I6m3nTOJu1zljvYlIe7fg5JDkOZ-MtuepMH_Du3Kfk4-V5NV_ky_fXt_lsmVvOIeY2nWhLBqxqOK2tMKVUXRpIbptUK6DG8LIVUvGGGVAd4yXITirbMcWZ4VPycNq78-PXHkPUWxcs9r0ZcNwHTStQFZdC0WSlJ6v1YwgeO73zbmv8QVPQR2p6oxM1faSmodSJWso8nTKYfvh26HWwDgeLrUs_R92O7p_0H2hodJE</recordid><startdate>20151101</startdate><enddate>20151101</enddate><creator>Rebai, Ilyes</creator><creator>BenAyed, Yassine</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20151101</creationdate><title>Text-to-speech synthesis system with Arabic diacritic recognition system</title><author>Rebai, Ilyes ; BenAyed, Yassine</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c330t-c201c42027b318c5a469fc4263cb426701aa34d5693b2a09f23406f69cf2932a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Accuracy</topic><topic>Acoustics</topic><topic>Deep neural networks</topic><topic>Diacritization system</topic><topic>Mathematical models</topic><topic>Natural language processing</topic><topic>Neural networks</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Statistical parametric</topic><topic>Synthesis</topic><topic>Text-to-speech synthesis</topic><topic>Texts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rebai, Ilyes</creatorcontrib><creatorcontrib>BenAyed, Yassine</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computer speech & language</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rebai, Ilyes</au><au>BenAyed, Yassine</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Text-to-speech synthesis system with Arabic diacritic recognition system</atitle><jtitle>Computer speech & language</jtitle><date>2015-11-01</date><risdate>2015</risdate><volume>34</volume><issue>1</issue><spage>43</spage><epage>60</epage><pages>43-60</pages><issn>0885-2308</issn><eissn>1095-8363</eissn><abstract>•We developed an Arabic text-to-speech system, including a diacritization system.•The speech synthesis system is based on statistical parametric.•We address the accuracy of diacritic and acoustic models.•We proposed a diacritization system based on the position of the current letter.•Neural network per unit type based synthesis system generates high speech quality. Text-to-speech synthesis system has been widely studied for many languages. However, speech synthesis for Arabic language has not sufficient progresses and it is still in its first stage. Statistical parametric synthesis based on hidden Markov models was the most commonly applied approach for Arabic language. Recently, synthesized speech quality based on deep neural networks was found as intelligible as human voice. This paper describes a Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients. Deep neural networks achieved state-of-the-art performance in a wide range of tasks, including speech synthesis. Our TTS system includes a diacritization system which is very important for Arabic TTS application. Our diacritization system is also based on deep neural networks. In addition to the use deep techniques, different methods were also proposed to model the acoustic parameters in order to address the problem of acoustic models accuracy. They are based on linguistic and acoustic characteristics (e.g. letter position based diacritization system, unit types based synthesis system, diacritic marks based synthesis system) and based on deep learning techniques (stacked generalization techniques). Experimental results show that our diacritization system can generate a diacritized text with high accuracy. As regards the speech synthesis system, the experimental results and subjective evaluation show that our proposed method for synthesis system can generate intelligible and natural speech.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.csl.2015.04.002</doi><tpages>18</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0885-2308
ispartof	Computer speech & language, 2015-11, Vol.34 (1), p.43-60
issn	0885-2308 1095-8363
language	eng
recordid	cdi_proquest_miscellaneous_1709736591
source	ScienceDirect Journals (5 years ago - present)
subjects	Accuracy Acoustics Deep neural networks Diacritization system Mathematical models Natural language processing Neural networks Speech Speech recognition Statistical parametric Synthesis Text-to-speech synthesis Texts
title	Text-to-speech synthesis system with Arabic diacritic recognition system
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T17%3A52%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Text-to-speech%20synthesis%20system%20with%20Arabic%20diacritic%20recognition%20system&rft.jtitle=Computer%20speech%20&%20language&rft.au=Rebai,%20Ilyes&rft.date=2015-11-01&rft.volume=34&rft.issue=1&rft.spage=43&rft.epage=60&rft.pages=43-60&rft.issn=0885-2308&rft.eissn=1095-8363&rft_id=info:doi/10.1016/j.csl.2015.04.002&rft_dat=%3Cproquest_cross%3E1709736591%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1709736591&rft_id=info:pmid/&rft_els_id=S0885230815000418&rfr_iscdi=true