Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus
Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1....
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Chou, Yi-Hui Chang, Kalvin Wu, Meng-Ju Ou, Winston Bi, Alice Wen-Hsin Yang, Carol Chen, Bryan Y Pai, Rong-Wei Yeh, Po-Yen Chiang, Jo-Peng Phoann, Iu-Tshian Chang, Winnie Cui, Chenxuan Chen, Noel Shi, Jiatong |
description | Taiwanese Hokkien is declining in use and status due to a language shift
towards Mandarin in Taiwan. This is partly why it is a low resource language in
NLP and speech research today. To ensure that the state of the art in speech
processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour
dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's
suite of self-supervised learning (SSL) speech representations on our dataset,
we find that model size does not consistently determine performance. In fact,
certain smaller models outperform larger ones. Furthermore, linguistic
alignment between pretraining data and the target language plays a crucial
role. |
doi_str_mv | 10.48550/arxiv.2312.06668 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2312_06668</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2312_06668</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-cd55cbfd8298215719cf0eb99a08ccba787ab9f96c2fa02dec8a26b20b1989963</originalsourceid><addsrcrecordid>eNotz7FOwzAUQFEvHVDLBzDhH0hqO9ixxypqKVIRQ7NHz_YzWA1JZJMAf48oTHe70iHkjrPyQUvJtpC-4lKKiouSKaX0DTnsF-hn-IjDKz1jH4o8T5iWmNHT84To3ujz6LHPdBwo0BbiJwyYkR7HyyXiQJsxTXPekFWAPuPtf9ekPezb5licXh6fmt2pAFXrwnkpnQ1eC6MFlzU3LjC0xgDTzlmodQ3WBKOcCMCER6dBKCuY5UYbo6o1uf_bXiHdlOI7pO_uF9RdQdUPHz5Gbg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus</title><source>arXiv.org</source><creator>Chou, Yi-Hui ; Chang, Kalvin ; Wu, Meng-Ju ; Ou, Winston ; Bi, Alice Wen-Hsin ; Yang, Carol ; Chen, Bryan Y ; Pai, Rong-Wei ; Yeh, Po-Yen ; Chiang, Jo-Peng ; Phoann, Iu-Tshian ; Chang, Winnie ; Cui, Chenxuan ; Chen, Noel ; Shi, Jiatong</creator><creatorcontrib>Chou, Yi-Hui ; Chang, Kalvin ; Wu, Meng-Ju ; Ou, Winston ; Bi, Alice Wen-Hsin ; Yang, Carol ; Chen, Bryan Y ; Pai, Rong-Wei ; Yeh, Po-Yen ; Chiang, Jo-Peng ; Phoann, Iu-Tshian ; Chang, Winnie ; Cui, Chenxuan ; Chen, Noel ; Shi, Jiatong</creatorcontrib><description>Taiwanese Hokkien is declining in use and status due to a language shift
towards Mandarin in Taiwan. This is partly why it is a low resource language in
NLP and speech research today. To ensure that the state of the art in speech
processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour
dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's
suite of self-supervised learning (SSL) speech representations on our dataset,
we find that model size does not consistently determine performance. In fact,
certain smaller models outperform larger ones. Furthermore, linguistic
alignment between pretraining data and the target language plays a crucial
role.</description><identifier>DOI: 10.48550/arxiv.2312.06668</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Sound</subject><creationdate>2023-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2312.06668$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2312.06668$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chou, Yi-Hui</creatorcontrib><creatorcontrib>Chang, Kalvin</creatorcontrib><creatorcontrib>Wu, Meng-Ju</creatorcontrib><creatorcontrib>Ou, Winston</creatorcontrib><creatorcontrib>Bi, Alice Wen-Hsin</creatorcontrib><creatorcontrib>Yang, Carol</creatorcontrib><creatorcontrib>Chen, Bryan Y</creatorcontrib><creatorcontrib>Pai, Rong-Wei</creatorcontrib><creatorcontrib>Yeh, Po-Yen</creatorcontrib><creatorcontrib>Chiang, Jo-Peng</creatorcontrib><creatorcontrib>Phoann, Iu-Tshian</creatorcontrib><creatorcontrib>Chang, Winnie</creatorcontrib><creatorcontrib>Cui, Chenxuan</creatorcontrib><creatorcontrib>Chen, Noel</creatorcontrib><creatorcontrib>Shi, Jiatong</creatorcontrib><title>Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus</title><description>Taiwanese Hokkien is declining in use and status due to a language shift
towards Mandarin in Taiwan. This is partly why it is a low resource language in
NLP and speech research today. To ensure that the state of the art in speech
processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour
dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's
suite of self-supervised learning (SSL) speech representations on our dataset,
we find that model size does not consistently determine performance. In fact,
certain smaller models outperform larger ones. Furthermore, linguistic
alignment between pretraining data and the target language plays a crucial
role.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7FOwzAUQFEvHVDLBzDhH0hqO9ixxypqKVIRQ7NHz_YzWA1JZJMAf48oTHe70iHkjrPyQUvJtpC-4lKKiouSKaX0DTnsF-hn-IjDKz1jH4o8T5iWmNHT84To3ujz6LHPdBwo0BbiJwyYkR7HyyXiQJsxTXPekFWAPuPtf9ekPezb5licXh6fmt2pAFXrwnkpnQ1eC6MFlzU3LjC0xgDTzlmodQ3WBKOcCMCER6dBKCuY5UYbo6o1uf_bXiHdlOI7pO_uF9RdQdUPHz5Gbg</recordid><startdate>20231205</startdate><enddate>20231205</enddate><creator>Chou, Yi-Hui</creator><creator>Chang, Kalvin</creator><creator>Wu, Meng-Ju</creator><creator>Ou, Winston</creator><creator>Bi, Alice Wen-Hsin</creator><creator>Yang, Carol</creator><creator>Chen, Bryan Y</creator><creator>Pai, Rong-Wei</creator><creator>Yeh, Po-Yen</creator><creator>Chiang, Jo-Peng</creator><creator>Phoann, Iu-Tshian</creator><creator>Chang, Winnie</creator><creator>Cui, Chenxuan</creator><creator>Chen, Noel</creator><creator>Shi, Jiatong</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231205</creationdate><title>Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus</title><author>Chou, Yi-Hui ; Chang, Kalvin ; Wu, Meng-Ju ; Ou, Winston ; Bi, Alice Wen-Hsin ; Yang, Carol ; Chen, Bryan Y ; Pai, Rong-Wei ; Yeh, Po-Yen ; Chiang, Jo-Peng ; Phoann, Iu-Tshian ; Chang, Winnie ; Cui, Chenxuan ; Chen, Noel ; Shi, Jiatong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-cd55cbfd8298215719cf0eb99a08ccba787ab9f96c2fa02dec8a26b20b1989963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Chou, Yi-Hui</creatorcontrib><creatorcontrib>Chang, Kalvin</creatorcontrib><creatorcontrib>Wu, Meng-Ju</creatorcontrib><creatorcontrib>Ou, Winston</creatorcontrib><creatorcontrib>Bi, Alice Wen-Hsin</creatorcontrib><creatorcontrib>Yang, Carol</creatorcontrib><creatorcontrib>Chen, Bryan Y</creatorcontrib><creatorcontrib>Pai, Rong-Wei</creatorcontrib><creatorcontrib>Yeh, Po-Yen</creatorcontrib><creatorcontrib>Chiang, Jo-Peng</creatorcontrib><creatorcontrib>Phoann, Iu-Tshian</creatorcontrib><creatorcontrib>Chang, Winnie</creatorcontrib><creatorcontrib>Cui, Chenxuan</creatorcontrib><creatorcontrib>Chen, Noel</creatorcontrib><creatorcontrib>Shi, Jiatong</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chou, Yi-Hui</au><au>Chang, Kalvin</au><au>Wu, Meng-Ju</au><au>Ou, Winston</au><au>Bi, Alice Wen-Hsin</au><au>Yang, Carol</au><au>Chen, Bryan Y</au><au>Pai, Rong-Wei</au><au>Yeh, Po-Yen</au><au>Chiang, Jo-Peng</au><au>Phoann, Iu-Tshian</au><au>Chang, Winnie</au><au>Cui, Chenxuan</au><au>Chen, Noel</au><au>Shi, Jiatong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus</atitle><date>2023-12-05</date><risdate>2023</risdate><abstract>Taiwanese Hokkien is declining in use and status due to a language shift
towards Mandarin in Taiwan. This is partly why it is a low resource language in
NLP and speech research today. To ensure that the state of the art in speech
processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour
dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's
suite of self-supervised learning (SSL) speech representations on our dataset,
we find that model size does not consistently determine performance. In fact,
certain smaller models outperform larger ones. Furthermore, linguistic
alignment between pretraining data and the target language plays a crucial
role.</abstract><doi>10.48550/arxiv.2312.06668</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2312.06668 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2312_06668 |
source | arXiv.org |
subjects | Computer Science - Computation and Language Computer Science - Sound |
title | Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T04%3A42%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20Self-supervised%20Speech%20Models%20on%20a%20Taiwanese%20Hokkien%20Corpus&rft.au=Chou,%20Yi-Hui&rft.date=2023-12-05&rft_id=info:doi/10.48550/arxiv.2312.06668&rft_dat=%3Carxiv_GOX%3E2312_06668%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |