TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
Speech provides a natural way for human–computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resour...
Gespeichert in:
Veröffentlicht in: | Language resources and evaluation 2022-09, Vol.56 (3), p.1043-1055 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1055 |
---|---|
container_issue | 3 |
container_start_page | 1043 |
container_title | Language resources and evaluation |
container_volume | 56 |
creator | Casanova, Edresson Junior, Arnaldo Candido Shulby, Christopher Oliveira, Frederico Santos de Teixeira, João Ponti, Moacir Antonelli Aluísio, Sandra |
description | Speech provides a natural way for human–computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 h from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in European Portuguese.
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001, as well as CNPq (National Council of Technological and Scientific Development) Grant 304266/2020-5 |
doi_str_mv | 10.1007/s10579-021-09570-4 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2703667428</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2703667428</sourcerecordid><originalsourceid>FETCH-LOGICAL-c389t-391153d3f8ea485d27bda882fb4f33b1ec65105955c9a01315244be783a95d273</originalsourceid><addsrcrecordid>eNp9kEtLAzEUhYMoqNU_4CriejTPJnGnxRcUFKzgLmTSTDulzoy5M0L99WY6Ql25uofL-e7jIHRGySUlRF0BJVKZjDCaESMVycQeOqJSidQi7_t_9CE6BlgRIphQ-ghNZ7PX7KWObbfoAgQ8qWPTwTV22G8VLuqIoQnBLzFsqnYZoARcVvg2uu9yXboK7-gTdFC4NYTT3zpCb_d3s8ljNn1-eJrcTDPPtWkzbiiVfM4LHZzQcs5UPndasyIXBec5DX4s0z9GSm8coZxKJkQelObO9G4-QhfD3CbWn2lxa1d1F6u00jJF-HisBNPJxQaXjzVADIVtYvnh4sZSYvvU7JCaTanZbWpWJIgPECRztQhxN_pf6nygoneusTF8ldC6HqFGWyb7c34AINB5kQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2703667428</pqid></control><display><type>article</type><title>TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese</title><source>SpringerLink Journals</source><creator>Casanova, Edresson ; Junior, Arnaldo Candido ; Shulby, Christopher ; Oliveira, Frederico Santos de ; Teixeira, João ; Ponti, Moacir Antonelli ; Aluísio, Sandra</creator><creatorcontrib>Casanova, Edresson ; Junior, Arnaldo Candido ; Shulby, Christopher ; Oliveira, Frederico Santos de ; Teixeira, João ; Ponti, Moacir Antonelli ; Aluísio, Sandra</creatorcontrib><description>Speech provides a natural way for human–computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 h from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in European Portuguese.
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001, as well as CNPq (National Council of Technological and Scientific Development) Grant 304266/2020-5</description><identifier>ISSN: 1574-020X</identifier><identifier>EISSN: 1574-020X</identifier><identifier>EISSN: 1574-0218</identifier><identifier>DOI: 10.1007/s10579-021-09570-4</identifier><language>eng</language><publisher>Dordrecht: Springer Netherlands</publisher><subject>Acoustics ; Audiobooks ; Brazilian Portuguese ; Computational Linguistics ; Computer Science ; Computerized corpora ; Corpora ; Corpus linguistics ; Datasets ; Deep learning ; English language ; Human-computer interaction ; Language and Literature ; Linguistics ; Neural networks ; Portuguese ; Portuguese language ; Project Notes ; Social Sciences ; Speech ; Speech recognition ; Speech synthesis ; Text-to-speech ; TTS</subject><ispartof>Language resources and evaluation, 2022-09, Vol.56 (3), p.1043-1055</ispartof><rights>The Author(s), under exclusive licence to Springer Nature B.V. 2021</rights><rights>The Author(s), under exclusive licence to Springer Nature B.V. 2021.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c389t-391153d3f8ea485d27bda882fb4f33b1ec65105955c9a01315244be783a95d273</citedby><cites>FETCH-LOGICAL-c389t-391153d3f8ea485d27bda882fb4f33b1ec65105955c9a01315244be783a95d273</cites><orcidid>0000-0002-6679-5702 ; 0000-0003-0160-7173 ; 0000-0002-5647-0891 ; 0000-0002-5885-6747 ; 0000-0001-5108-2630 ; 0000-0003-2059-9463 ; 0000-0001-9637-9657</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10579-021-09570-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10579-021-09570-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Casanova, Edresson</creatorcontrib><creatorcontrib>Junior, Arnaldo Candido</creatorcontrib><creatorcontrib>Shulby, Christopher</creatorcontrib><creatorcontrib>Oliveira, Frederico Santos de</creatorcontrib><creatorcontrib>Teixeira, João</creatorcontrib><creatorcontrib>Ponti, Moacir Antonelli</creatorcontrib><creatorcontrib>Aluísio, Sandra</creatorcontrib><title>TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese</title><title>Language resources and evaluation</title><addtitle>Lang Resources & Evaluation</addtitle><description>Speech provides a natural way for human–computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 h from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in European Portuguese.
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001, as well as CNPq (National Council of Technological and Scientific Development) Grant 304266/2020-5</description><subject>Acoustics</subject><subject>Audiobooks</subject><subject>Brazilian Portuguese</subject><subject>Computational Linguistics</subject><subject>Computer Science</subject><subject>Computerized corpora</subject><subject>Corpora</subject><subject>Corpus linguistics</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>English language</subject><subject>Human-computer interaction</subject><subject>Language and Literature</subject><subject>Linguistics</subject><subject>Neural networks</subject><subject>Portuguese</subject><subject>Portuguese language</subject><subject>Project Notes</subject><subject>Social Sciences</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Speech synthesis</subject><subject>Text-to-speech</subject><subject>TTS</subject><issn>1574-020X</issn><issn>1574-020X</issn><issn>1574-0218</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AIMQZ</sourceid><sourceid>AVQMV</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>K50</sourceid><sourceid>M1D</sourceid><sourceid>M2O</sourceid><recordid>eNp9kEtLAzEUhYMoqNU_4CriejTPJnGnxRcUFKzgLmTSTDulzoy5M0L99WY6Ql25uofL-e7jIHRGySUlRF0BJVKZjDCaESMVycQeOqJSidQi7_t_9CE6BlgRIphQ-ghNZ7PX7KWObbfoAgQ8qWPTwTV22G8VLuqIoQnBLzFsqnYZoARcVvg2uu9yXboK7-gTdFC4NYTT3zpCb_d3s8ljNn1-eJrcTDPPtWkzbiiVfM4LHZzQcs5UPndasyIXBec5DX4s0z9GSm8coZxKJkQelObO9G4-QhfD3CbWn2lxa1d1F6u00jJF-HisBNPJxQaXjzVADIVtYvnh4sZSYvvU7JCaTanZbWpWJIgPECRztQhxN_pf6nygoneusTF8ldC6HqFGWyb7c34AINB5kQ</recordid><startdate>20220901</startdate><enddate>20220901</enddate><creator>Casanova, Edresson</creator><creator>Junior, Arnaldo Candido</creator><creator>Shulby, Christopher</creator><creator>Oliveira, Frederico Santos de</creator><creator>Teixeira, João</creator><creator>Ponti, Moacir Antonelli</creator><creator>Aluísio, Sandra</creator><general>Springer Netherlands</general><general>Springer Nature B.V</general><scope>RCLKO</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7T9</scope><scope>7XB</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AIMQZ</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AVQMV</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CPGLG</scope><scope>CRLPW</scope><scope>DWQXO</scope><scope>GB0</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K50</scope><scope>K7-</scope><scope>L7M</scope><scope>LIQON</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M1D</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-6679-5702</orcidid><orcidid>https://orcid.org/0000-0003-0160-7173</orcidid><orcidid>https://orcid.org/0000-0002-5647-0891</orcidid><orcidid>https://orcid.org/0000-0002-5885-6747</orcidid><orcidid>https://orcid.org/0000-0001-5108-2630</orcidid><orcidid>https://orcid.org/0000-0003-2059-9463</orcidid><orcidid>https://orcid.org/0000-0001-9637-9657</orcidid></search><sort><creationdate>20220901</creationdate><title>TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese</title><author>Casanova, Edresson ; Junior, Arnaldo Candido ; Shulby, Christopher ; Oliveira, Frederico Santos de ; Teixeira, João ; Ponti, Moacir Antonelli ; Aluísio, Sandra</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c389t-391153d3f8ea485d27bda882fb4f33b1ec65105955c9a01315244be783a95d273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Acoustics</topic><topic>Audiobooks</topic><topic>Brazilian Portuguese</topic><topic>Computational Linguistics</topic><topic>Computer Science</topic><topic>Computerized corpora</topic><topic>Corpora</topic><topic>Corpus linguistics</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>English language</topic><topic>Human-computer interaction</topic><topic>Language and Literature</topic><topic>Linguistics</topic><topic>Neural networks</topic><topic>Portuguese</topic><topic>Portuguese language</topic><topic>Project Notes</topic><topic>Social Sciences</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Speech synthesis</topic><topic>Text-to-speech</topic><topic>TTS</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Casanova, Edresson</creatorcontrib><creatorcontrib>Junior, Arnaldo Candido</creatorcontrib><creatorcontrib>Shulby, Christopher</creatorcontrib><creatorcontrib>Oliveira, Frederico Santos de</creatorcontrib><creatorcontrib>Teixeira, João</creatorcontrib><creatorcontrib>Ponti, Moacir Antonelli</creatorcontrib><creatorcontrib>Aluísio, Sandra</creatorcontrib><collection>RCAAP open access repository</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest One Literature</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Arts Premium Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Linguistics Collection</collection><collection>Linguistics Database</collection><collection>ProQuest Central Korea</collection><collection>DELNET Social Sciences & Humanities Collection</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Access via Art, Design & Architecture Collection (ProQuest)</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest One Literature - U.S. Customers Only</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Arts & Humanities Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Language resources and evaluation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Casanova, Edresson</au><au>Junior, Arnaldo Candido</au><au>Shulby, Christopher</au><au>Oliveira, Frederico Santos de</au><au>Teixeira, João</au><au>Ponti, Moacir Antonelli</au><au>Aluísio, Sandra</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese</atitle><jtitle>Language resources and evaluation</jtitle><stitle>Lang Resources & Evaluation</stitle><date>2022-09-01</date><risdate>2022</risdate><volume>56</volume><issue>3</issue><spage>1043</spage><epage>1055</epage><pages>1043-1055</pages><issn>1574-020X</issn><eissn>1574-020X</eissn><eissn>1574-0218</eissn><abstract>Speech provides a natural way for human–computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 h from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in European Portuguese.
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001, as well as CNPq (National Council of Technological and Scientific Development) Grant 304266/2020-5</abstract><cop>Dordrecht</cop><pub>Springer Netherlands</pub><doi>10.1007/s10579-021-09570-4</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-6679-5702</orcidid><orcidid>https://orcid.org/0000-0003-0160-7173</orcidid><orcidid>https://orcid.org/0000-0002-5647-0891</orcidid><orcidid>https://orcid.org/0000-0002-5885-6747</orcidid><orcidid>https://orcid.org/0000-0001-5108-2630</orcidid><orcidid>https://orcid.org/0000-0003-2059-9463</orcidid><orcidid>https://orcid.org/0000-0001-9637-9657</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1574-020X |
ispartof | Language resources and evaluation, 2022-09, Vol.56 (3), p.1043-1055 |
issn | 1574-020X 1574-020X 1574-0218 |
language | eng |
recordid | cdi_proquest_journals_2703667428 |
source | SpringerLink Journals |
subjects | Acoustics Audiobooks Brazilian Portuguese Computational Linguistics Computer Science Computerized corpora Corpora Corpus linguistics Datasets Deep learning English language Human-computer interaction Language and Literature Linguistics Neural networks Portuguese Portuguese language Project Notes Social Sciences Speech Speech recognition Speech synthesis Text-to-speech TTS |
title | TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T06%3A18%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TTS-Portuguese%20Corpus:%20a%20corpus%20for%20speech%20synthesis%20in%20Brazilian%20Portuguese&rft.jtitle=Language%20resources%20and%20evaluation&rft.au=Casanova,%20Edresson&rft.date=2022-09-01&rft.volume=56&rft.issue=3&rft.spage=1043&rft.epage=1055&rft.pages=1043-1055&rft.issn=1574-020X&rft.eissn=1574-020X&rft_id=info:doi/10.1007/s10579-021-09570-4&rft_dat=%3Cproquest_cross%3E2703667428%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2703667428&rft_id=info:pmid/&rfr_iscdi=true |