TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese

Speech provides a natural way for human–computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resour...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Language resources and evaluation 2022-09, Vol.56 (3), p.1043-1055
Hauptverfasser: Casanova, Edresson, Junior, Arnaldo Candido, Shulby, Christopher, Oliveira, Frederico Santos de, Teixeira, João, Ponti, Moacir Antonelli, Aluísio, Sandra
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1055
container_issue 3
container_start_page 1043
container_title Language resources and evaluation
container_volume 56
creator Casanova, Edresson
Junior, Arnaldo Candido
Shulby, Christopher
Oliveira, Frederico Santos de
Teixeira, João
Ponti, Moacir Antonelli
Aluísio, Sandra
description Speech provides a natural way for human–computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 h from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in European Portuguese. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001, as well as CNPq (National Council of Technological and Scientific Development) Grant 304266/2020-5
doi_str_mv 10.1007/s10579-021-09570-4
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2703667428</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2703667428</sourcerecordid><originalsourceid>FETCH-LOGICAL-c389t-391153d3f8ea485d27bda882fb4f33b1ec65105955c9a01315244be783a95d273</originalsourceid><addsrcrecordid>eNp9kEtLAzEUhYMoqNU_4CriejTPJnGnxRcUFKzgLmTSTDulzoy5M0L99WY6Ql25uofL-e7jIHRGySUlRF0BJVKZjDCaESMVycQeOqJSidQi7_t_9CE6BlgRIphQ-ghNZ7PX7KWObbfoAgQ8qWPTwTV22G8VLuqIoQnBLzFsqnYZoARcVvg2uu9yXboK7-gTdFC4NYTT3zpCb_d3s8ljNn1-eJrcTDPPtWkzbiiVfM4LHZzQcs5UPndasyIXBec5DX4s0z9GSm8coZxKJkQelObO9G4-QhfD3CbWn2lxa1d1F6u00jJF-HisBNPJxQaXjzVADIVtYvnh4sZSYvvU7JCaTanZbWpWJIgPECRztQhxN_pf6nygoneusTF8ldC6HqFGWyb7c34AINB5kQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2703667428</pqid></control><display><type>article</type><title>TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese</title><source>SpringerLink Journals</source><creator>Casanova, Edresson ; Junior, Arnaldo Candido ; Shulby, Christopher ; Oliveira, Frederico Santos de ; Teixeira, João ; Ponti, Moacir Antonelli ; Aluísio, Sandra</creator><creatorcontrib>Casanova, Edresson ; Junior, Arnaldo Candido ; Shulby, Christopher ; Oliveira, Frederico Santos de ; Teixeira, João ; Ponti, Moacir Antonelli ; Aluísio, Sandra</creatorcontrib><description>Speech provides a natural way for human–computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 h from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in European Portuguese. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001, as well as CNPq (National Council of Technological and Scientific Development) Grant 304266/2020-5</description><identifier>ISSN: 1574-020X</identifier><identifier>EISSN: 1574-020X</identifier><identifier>EISSN: 1574-0218</identifier><identifier>DOI: 10.1007/s10579-021-09570-4</identifier><language>eng</language><publisher>Dordrecht: Springer Netherlands</publisher><subject>Acoustics ; Audiobooks ; Brazilian Portuguese ; Computational Linguistics ; Computer Science ; Computerized corpora ; Corpora ; Corpus linguistics ; Datasets ; Deep learning ; English language ; Human-computer interaction ; Language and Literature ; Linguistics ; Neural networks ; Portuguese ; Portuguese language ; Project Notes ; Social Sciences ; Speech ; Speech recognition ; Speech synthesis ; Text-to-speech ; TTS</subject><ispartof>Language resources and evaluation, 2022-09, Vol.56 (3), p.1043-1055</ispartof><rights>The Author(s), under exclusive licence to Springer Nature B.V. 2021</rights><rights>The Author(s), under exclusive licence to Springer Nature B.V. 2021.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c389t-391153d3f8ea485d27bda882fb4f33b1ec65105955c9a01315244be783a95d273</citedby><cites>FETCH-LOGICAL-c389t-391153d3f8ea485d27bda882fb4f33b1ec65105955c9a01315244be783a95d273</cites><orcidid>0000-0002-6679-5702 ; 0000-0003-0160-7173 ; 0000-0002-5647-0891 ; 0000-0002-5885-6747 ; 0000-0001-5108-2630 ; 0000-0003-2059-9463 ; 0000-0001-9637-9657</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10579-021-09570-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10579-021-09570-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Casanova, Edresson</creatorcontrib><creatorcontrib>Junior, Arnaldo Candido</creatorcontrib><creatorcontrib>Shulby, Christopher</creatorcontrib><creatorcontrib>Oliveira, Frederico Santos de</creatorcontrib><creatorcontrib>Teixeira, João</creatorcontrib><creatorcontrib>Ponti, Moacir Antonelli</creatorcontrib><creatorcontrib>Aluísio, Sandra</creatorcontrib><title>TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese</title><title>Language resources and evaluation</title><addtitle>Lang Resources &amp; Evaluation</addtitle><description>Speech provides a natural way for human–computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 h from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in European Portuguese. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001, as well as CNPq (National Council of Technological and Scientific Development) Grant 304266/2020-5</description><subject>Acoustics</subject><subject>Audiobooks</subject><subject>Brazilian Portuguese</subject><subject>Computational Linguistics</subject><subject>Computer Science</subject><subject>Computerized corpora</subject><subject>Corpora</subject><subject>Corpus linguistics</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>English language</subject><subject>Human-computer interaction</subject><subject>Language and Literature</subject><subject>Linguistics</subject><subject>Neural networks</subject><subject>Portuguese</subject><subject>Portuguese language</subject><subject>Project Notes</subject><subject>Social Sciences</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Speech synthesis</subject><subject>Text-to-speech</subject><subject>TTS</subject><issn>1574-020X</issn><issn>1574-020X</issn><issn>1574-0218</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AIMQZ</sourceid><sourceid>AVQMV</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>K50</sourceid><sourceid>M1D</sourceid><sourceid>M2O</sourceid><recordid>eNp9kEtLAzEUhYMoqNU_4CriejTPJnGnxRcUFKzgLmTSTDulzoy5M0L99WY6Ql25uofL-e7jIHRGySUlRF0BJVKZjDCaESMVycQeOqJSidQi7_t_9CE6BlgRIphQ-ghNZ7PX7KWObbfoAgQ8qWPTwTV22G8VLuqIoQnBLzFsqnYZoARcVvg2uu9yXboK7-gTdFC4NYTT3zpCb_d3s8ljNn1-eJrcTDPPtWkzbiiVfM4LHZzQcs5UPndasyIXBec5DX4s0z9GSm8coZxKJkQelObO9G4-QhfD3CbWn2lxa1d1F6u00jJF-HisBNPJxQaXjzVADIVtYvnh4sZSYvvU7JCaTanZbWpWJIgPECRztQhxN_pf6nygoneusTF8ldC6HqFGWyb7c34AINB5kQ</recordid><startdate>20220901</startdate><enddate>20220901</enddate><creator>Casanova, Edresson</creator><creator>Junior, Arnaldo Candido</creator><creator>Shulby, Christopher</creator><creator>Oliveira, Frederico Santos de</creator><creator>Teixeira, João</creator><creator>Ponti, Moacir Antonelli</creator><creator>Aluísio, Sandra</creator><general>Springer Netherlands</general><general>Springer Nature B.V</general><scope>RCLKO</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7T9</scope><scope>7XB</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AIMQZ</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AVQMV</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CPGLG</scope><scope>CRLPW</scope><scope>DWQXO</scope><scope>GB0</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K50</scope><scope>K7-</scope><scope>L7M</scope><scope>LIQON</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M1D</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-6679-5702</orcidid><orcidid>https://orcid.org/0000-0003-0160-7173</orcidid><orcidid>https://orcid.org/0000-0002-5647-0891</orcidid><orcidid>https://orcid.org/0000-0002-5885-6747</orcidid><orcidid>https://orcid.org/0000-0001-5108-2630</orcidid><orcidid>https://orcid.org/0000-0003-2059-9463</orcidid><orcidid>https://orcid.org/0000-0001-9637-9657</orcidid></search><sort><creationdate>20220901</creationdate><title>TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese</title><author>Casanova, Edresson ; Junior, Arnaldo Candido ; Shulby, Christopher ; Oliveira, Frederico Santos de ; Teixeira, João ; Ponti, Moacir Antonelli ; Aluísio, Sandra</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c389t-391153d3f8ea485d27bda882fb4f33b1ec65105955c9a01315244be783a95d273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Acoustics</topic><topic>Audiobooks</topic><topic>Brazilian Portuguese</topic><topic>Computational Linguistics</topic><topic>Computer Science</topic><topic>Computerized corpora</topic><topic>Corpora</topic><topic>Corpus linguistics</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>English language</topic><topic>Human-computer interaction</topic><topic>Language and Literature</topic><topic>Linguistics</topic><topic>Neural networks</topic><topic>Portuguese</topic><topic>Portuguese language</topic><topic>Project Notes</topic><topic>Social Sciences</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Speech synthesis</topic><topic>Text-to-speech</topic><topic>TTS</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Casanova, Edresson</creatorcontrib><creatorcontrib>Junior, Arnaldo Candido</creatorcontrib><creatorcontrib>Shulby, Christopher</creatorcontrib><creatorcontrib>Oliveira, Frederico Santos de</creatorcontrib><creatorcontrib>Teixeira, João</creatorcontrib><creatorcontrib>Ponti, Moacir Antonelli</creatorcontrib><creatorcontrib>Aluísio, Sandra</creatorcontrib><collection>RCAAP open access repository</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest One Literature</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Arts Premium Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Linguistics Collection</collection><collection>Linguistics Database</collection><collection>ProQuest Central Korea</collection><collection>DELNET Social Sciences &amp; Humanities Collection</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Access via Art, Design &amp; Architecture Collection (ProQuest)</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest One Literature - U.S. Customers Only</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Arts &amp; Humanities Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Language resources and evaluation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Casanova, Edresson</au><au>Junior, Arnaldo Candido</au><au>Shulby, Christopher</au><au>Oliveira, Frederico Santos de</au><au>Teixeira, João</au><au>Ponti, Moacir Antonelli</au><au>Aluísio, Sandra</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese</atitle><jtitle>Language resources and evaluation</jtitle><stitle>Lang Resources &amp; Evaluation</stitle><date>2022-09-01</date><risdate>2022</risdate><volume>56</volume><issue>3</issue><spage>1043</spage><epage>1055</epage><pages>1043-1055</pages><issn>1574-020X</issn><eissn>1574-020X</eissn><eissn>1574-0218</eissn><abstract>Speech provides a natural way for human–computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 h from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in European Portuguese. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001, as well as CNPq (National Council of Technological and Scientific Development) Grant 304266/2020-5</abstract><cop>Dordrecht</cop><pub>Springer Netherlands</pub><doi>10.1007/s10579-021-09570-4</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-6679-5702</orcidid><orcidid>https://orcid.org/0000-0003-0160-7173</orcidid><orcidid>https://orcid.org/0000-0002-5647-0891</orcidid><orcidid>https://orcid.org/0000-0002-5885-6747</orcidid><orcidid>https://orcid.org/0000-0001-5108-2630</orcidid><orcidid>https://orcid.org/0000-0003-2059-9463</orcidid><orcidid>https://orcid.org/0000-0001-9637-9657</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1574-020X
ispartof Language resources and evaluation, 2022-09, Vol.56 (3), p.1043-1055
issn 1574-020X
1574-020X
1574-0218
language eng
recordid cdi_proquest_journals_2703667428
source SpringerLink Journals
subjects Acoustics
Audiobooks
Brazilian Portuguese
Computational Linguistics
Computer Science
Computerized corpora
Corpora
Corpus linguistics
Datasets
Deep learning
English language
Human-computer interaction
Language and Literature
Linguistics
Neural networks
Portuguese
Portuguese language
Project Notes
Social Sciences
Speech
Speech recognition
Speech synthesis
Text-to-speech
TTS
title TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T06%3A18%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TTS-Portuguese%20Corpus:%20a%20corpus%20for%20speech%20synthesis%20in%20Brazilian%20Portuguese&rft.jtitle=Language%20resources%20and%20evaluation&rft.au=Casanova,%20Edresson&rft.date=2022-09-01&rft.volume=56&rft.issue=3&rft.spage=1043&rft.epage=1055&rft.pages=1043-1055&rft.issn=1574-020X&rft.eissn=1574-020X&rft_id=info:doi/10.1007/s10579-021-09570-4&rft_dat=%3Cproquest_cross%3E2703667428%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2703667428&rft_id=info:pmid/&rfr_iscdi=true