Text-to-speech synthesis system with Arabic diacritic recognition system

•We developed an Arabic text-to-speech system, including a diacritization system.•The speech synthesis system is based on statistical parametric.•We address the accuracy of diacritic and acoustic models.•We proposed a diacritization system based on the position of the current letter.•Neural network...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer speech & language 2015-11, Vol.34 (1), p.43-60
Hauptverfasser: Rebai, Ilyes, BenAyed, Yassine
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 60
container_issue 1
container_start_page 43
container_title Computer speech & language
container_volume 34
creator Rebai, Ilyes
BenAyed, Yassine
description •We developed an Arabic text-to-speech system, including a diacritization system.•The speech synthesis system is based on statistical parametric.•We address the accuracy of diacritic and acoustic models.•We proposed a diacritization system based on the position of the current letter.•Neural network per unit type based synthesis system generates high speech quality. Text-to-speech synthesis system has been widely studied for many languages. However, speech synthesis for Arabic language has not sufficient progresses and it is still in its first stage. Statistical parametric synthesis based on hidden Markov models was the most commonly applied approach for Arabic language. Recently, synthesized speech quality based on deep neural networks was found as intelligible as human voice. This paper describes a Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients. Deep neural networks achieved state-of-the-art performance in a wide range of tasks, including speech synthesis. Our TTS system includes a diacritization system which is very important for Arabic TTS application. Our diacritization system is also based on deep neural networks. In addition to the use deep techniques, different methods were also proposed to model the acoustic parameters in order to address the problem of acoustic models accuracy. They are based on linguistic and acoustic characteristics (e.g. letter position based diacritization system, unit types based synthesis system, diacritic marks based synthesis system) and based on deep learning techniques (stacked generalization techniques). Experimental results show that our diacritization system can generate a diacritized text with high accuracy. As regards the speech synthesis system, the experimental results and subjective evaluation show that our proposed method for synthesis system can generate intelligible and natural speech.
doi_str_mv 10.1016/j.csl.2015.04.002
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1709736591</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0885230815000418</els_id><sourcerecordid>1709736591</sourcerecordid><originalsourceid>FETCH-LOGICAL-c330t-c201c42027b318c5a469fc4263cb426701aa34d5693b2a09f23406f69cf2932a3</originalsourceid><addsrcrecordid>eNp9kDFPwzAQhS0EEqXwA9gysiSc7diJxVRVQJEqsZTZcpwLdZUmxXaB_ntctTPL3dPpvdPdR8g9hYIClY-bwoa-YEBFAWUBwC7IhIISec0lvyQTqGuRMw71NbkJYQMAUpTVhCxW-BvzOOZhh2jXWTgMcY3BhaRCxG324-I6m3nTOJu1zljvYlIe7fg5JDkOZ-MtuepMH_Du3Kfk4-V5NV_ky_fXt_lsmVvOIeY2nWhLBqxqOK2tMKVUXRpIbptUK6DG8LIVUvGGGVAd4yXITirbMcWZ4VPycNq78-PXHkPUWxcs9r0ZcNwHTStQFZdC0WSlJ6v1YwgeO73zbmv8QVPQR2p6oxM1faSmodSJWso8nTKYfvh26HWwDgeLrUs_R92O7p_0H2hodJE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1709736591</pqid></control><display><type>article</type><title>Text-to-speech synthesis system with Arabic diacritic recognition system</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Rebai, Ilyes ; BenAyed, Yassine</creator><creatorcontrib>Rebai, Ilyes ; BenAyed, Yassine</creatorcontrib><description>•We developed an Arabic text-to-speech system, including a diacritization system.•The speech synthesis system is based on statistical parametric.•We address the accuracy of diacritic and acoustic models.•We proposed a diacritization system based on the position of the current letter.•Neural network per unit type based synthesis system generates high speech quality. Text-to-speech synthesis system has been widely studied for many languages. However, speech synthesis for Arabic language has not sufficient progresses and it is still in its first stage. Statistical parametric synthesis based on hidden Markov models was the most commonly applied approach for Arabic language. Recently, synthesized speech quality based on deep neural networks was found as intelligible as human voice. This paper describes a Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients. Deep neural networks achieved state-of-the-art performance in a wide range of tasks, including speech synthesis. Our TTS system includes a diacritization system which is very important for Arabic TTS application. Our diacritization system is also based on deep neural networks. In addition to the use deep techniques, different methods were also proposed to model the acoustic parameters in order to address the problem of acoustic models accuracy. They are based on linguistic and acoustic characteristics (e.g. letter position based diacritization system, unit types based synthesis system, diacritic marks based synthesis system) and based on deep learning techniques (stacked generalization techniques). Experimental results show that our diacritization system can generate a diacritized text with high accuracy. As regards the speech synthesis system, the experimental results and subjective evaluation show that our proposed method for synthesis system can generate intelligible and natural speech.</description><identifier>ISSN: 0885-2308</identifier><identifier>EISSN: 1095-8363</identifier><identifier>DOI: 10.1016/j.csl.2015.04.002</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Accuracy ; Acoustics ; Deep neural networks ; Diacritization system ; Mathematical models ; Natural language processing ; Neural networks ; Speech ; Speech recognition ; Statistical parametric ; Synthesis ; Text-to-speech synthesis ; Texts</subject><ispartof>Computer speech &amp; language, 2015-11, Vol.34 (1), p.43-60</ispartof><rights>2015 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c330t-c201c42027b318c5a469fc4263cb426701aa34d5693b2a09f23406f69cf2932a3</citedby><cites>FETCH-LOGICAL-c330t-c201c42027b318c5a469fc4263cb426701aa34d5693b2a09f23406f69cf2932a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.csl.2015.04.002$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,777,781,3537,27905,27906,45976</link.rule.ids></links><search><creatorcontrib>Rebai, Ilyes</creatorcontrib><creatorcontrib>BenAyed, Yassine</creatorcontrib><title>Text-to-speech synthesis system with Arabic diacritic recognition system</title><title>Computer speech &amp; language</title><description>•We developed an Arabic text-to-speech system, including a diacritization system.•The speech synthesis system is based on statistical parametric.•We address the accuracy of diacritic and acoustic models.•We proposed a diacritization system based on the position of the current letter.•Neural network per unit type based synthesis system generates high speech quality. Text-to-speech synthesis system has been widely studied for many languages. However, speech synthesis for Arabic language has not sufficient progresses and it is still in its first stage. Statistical parametric synthesis based on hidden Markov models was the most commonly applied approach for Arabic language. Recently, synthesized speech quality based on deep neural networks was found as intelligible as human voice. This paper describes a Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients. Deep neural networks achieved state-of-the-art performance in a wide range of tasks, including speech synthesis. Our TTS system includes a diacritization system which is very important for Arabic TTS application. Our diacritization system is also based on deep neural networks. In addition to the use deep techniques, different methods were also proposed to model the acoustic parameters in order to address the problem of acoustic models accuracy. They are based on linguistic and acoustic characteristics (e.g. letter position based diacritization system, unit types based synthesis system, diacritic marks based synthesis system) and based on deep learning techniques (stacked generalization techniques). Experimental results show that our diacritization system can generate a diacritized text with high accuracy. As regards the speech synthesis system, the experimental results and subjective evaluation show that our proposed method for synthesis system can generate intelligible and natural speech.</description><subject>Accuracy</subject><subject>Acoustics</subject><subject>Deep neural networks</subject><subject>Diacritization system</subject><subject>Mathematical models</subject><subject>Natural language processing</subject><subject>Neural networks</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Statistical parametric</subject><subject>Synthesis</subject><subject>Text-to-speech synthesis</subject><subject>Texts</subject><issn>0885-2308</issn><issn>1095-8363</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNp9kDFPwzAQhS0EEqXwA9gysiSc7diJxVRVQJEqsZTZcpwLdZUmxXaB_ntctTPL3dPpvdPdR8g9hYIClY-bwoa-YEBFAWUBwC7IhIISec0lvyQTqGuRMw71NbkJYQMAUpTVhCxW-BvzOOZhh2jXWTgMcY3BhaRCxG324-I6m3nTOJu1zljvYlIe7fg5JDkOZ-MtuepMH_Du3Kfk4-V5NV_ky_fXt_lsmVvOIeY2nWhLBqxqOK2tMKVUXRpIbptUK6DG8LIVUvGGGVAd4yXITirbMcWZ4VPycNq78-PXHkPUWxcs9r0ZcNwHTStQFZdC0WSlJ6v1YwgeO73zbmv8QVPQR2p6oxM1faSmodSJWso8nTKYfvh26HWwDgeLrUs_R92O7p_0H2hodJE</recordid><startdate>20151101</startdate><enddate>20151101</enddate><creator>Rebai, Ilyes</creator><creator>BenAyed, Yassine</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20151101</creationdate><title>Text-to-speech synthesis system with Arabic diacritic recognition system</title><author>Rebai, Ilyes ; BenAyed, Yassine</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c330t-c201c42027b318c5a469fc4263cb426701aa34d5693b2a09f23406f69cf2932a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Accuracy</topic><topic>Acoustics</topic><topic>Deep neural networks</topic><topic>Diacritization system</topic><topic>Mathematical models</topic><topic>Natural language processing</topic><topic>Neural networks</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Statistical parametric</topic><topic>Synthesis</topic><topic>Text-to-speech synthesis</topic><topic>Texts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rebai, Ilyes</creatorcontrib><creatorcontrib>BenAyed, Yassine</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computer speech &amp; language</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rebai, Ilyes</au><au>BenAyed, Yassine</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Text-to-speech synthesis system with Arabic diacritic recognition system</atitle><jtitle>Computer speech &amp; language</jtitle><date>2015-11-01</date><risdate>2015</risdate><volume>34</volume><issue>1</issue><spage>43</spage><epage>60</epage><pages>43-60</pages><issn>0885-2308</issn><eissn>1095-8363</eissn><abstract>•We developed an Arabic text-to-speech system, including a diacritization system.•The speech synthesis system is based on statistical parametric.•We address the accuracy of diacritic and acoustic models.•We proposed a diacritization system based on the position of the current letter.•Neural network per unit type based synthesis system generates high speech quality. Text-to-speech synthesis system has been widely studied for many languages. However, speech synthesis for Arabic language has not sufficient progresses and it is still in its first stage. Statistical parametric synthesis based on hidden Markov models was the most commonly applied approach for Arabic language. Recently, synthesized speech quality based on deep neural networks was found as intelligible as human voice. This paper describes a Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients. Deep neural networks achieved state-of-the-art performance in a wide range of tasks, including speech synthesis. Our TTS system includes a diacritization system which is very important for Arabic TTS application. Our diacritization system is also based on deep neural networks. In addition to the use deep techniques, different methods were also proposed to model the acoustic parameters in order to address the problem of acoustic models accuracy. They are based on linguistic and acoustic characteristics (e.g. letter position based diacritization system, unit types based synthesis system, diacritic marks based synthesis system) and based on deep learning techniques (stacked generalization techniques). Experimental results show that our diacritization system can generate a diacritized text with high accuracy. As regards the speech synthesis system, the experimental results and subjective evaluation show that our proposed method for synthesis system can generate intelligible and natural speech.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.csl.2015.04.002</doi><tpages>18</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0885-2308
ispartof Computer speech & language, 2015-11, Vol.34 (1), p.43-60
issn 0885-2308
1095-8363
language eng
recordid cdi_proquest_miscellaneous_1709736591
source ScienceDirect Journals (5 years ago - present)
subjects Accuracy
Acoustics
Deep neural networks
Diacritization system
Mathematical models
Natural language processing
Neural networks
Speech
Speech recognition
Statistical parametric
Synthesis
Text-to-speech synthesis
Texts
title Text-to-speech synthesis system with Arabic diacritic recognition system
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T17%3A52%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Text-to-speech%20synthesis%20system%20with%20Arabic%20diacritic%20recognition%20system&rft.jtitle=Computer%20speech%20&%20language&rft.au=Rebai,%20Ilyes&rft.date=2015-11-01&rft.volume=34&rft.issue=1&rft.spage=43&rft.epage=60&rft.pages=43-60&rft.issn=0885-2308&rft.eissn=1095-8363&rft_id=info:doi/10.1016/j.csl.2015.04.002&rft_dat=%3Cproquest_cross%3E1709736591%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1709736591&rft_id=info:pmid/&rft_els_id=S0885230815000418&rfr_iscdi=true