Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chou, Yi-Hui, Chang, Kalvin, Wu, Meng-Ju, Ou, Winston, Bi, Alice Wen-Hsin, Yang, Carol, Chen, Bryan Y, Pai, Rong-Wei, Yeh, Po-Yen, Chiang, Jo-Peng, Phoann, Iu-Tshian, Chang, Winnie, Cui, Chenxuan, Chen, Noel, Shi, Jiatong
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Chou, Yi-Hui Chang, Kalvin Wu, Meng-Ju Ou, Winston Bi, Alice Wen-Hsin Yang, Carol Chen, Bryan Y Pai, Rong-Wei Yeh, Po-Yen Chiang, Jo-Peng Phoann, Iu-Tshian Chang, Winnie Cui, Chenxuan Chen, Noel Shi, Jiatong
description	Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role.
doi_str_mv	10.48550/arxiv.2312.06668
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2312_06668</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2312_06668</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-cd55cbfd8298215719cf0eb99a08ccba787ab9f96c2fa02dec8a26b20b1989963</originalsourceid><addsrcrecordid>eNotz7FOwzAUQFEvHVDLBzDhH0hqO9ixxypqKVIRQ7NHz_YzWA1JZJMAf48oTHe70iHkjrPyQUvJtpC-4lKKiouSKaX0DTnsF-hn-IjDKz1jH4o8T5iWmNHT84To3ujz6LHPdBwo0BbiJwyYkR7HyyXiQJsxTXPekFWAPuPtf9ekPezb5licXh6fmt2pAFXrwnkpnQ1eC6MFlzU3LjC0xgDTzlmodQ3WBKOcCMCER6dBKCuY5UYbo6o1uf_bXiHdlOI7pO_uF9RdQdUPHz5Gbg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus</title><source>arXiv.org</source><creator>Chou, Yi-Hui ; Chang, Kalvin ; Wu, Meng-Ju ; Ou, Winston ; Bi, Alice Wen-Hsin ; Yang, Carol ; Chen, Bryan Y ; Pai, Rong-Wei ; Yeh, Po-Yen ; Chiang, Jo-Peng ; Phoann, Iu-Tshian ; Chang, Winnie ; Cui, Chenxuan ; Chen, Noel ; Shi, Jiatong</creator><creatorcontrib>Chou, Yi-Hui ; Chang, Kalvin ; Wu, Meng-Ju ; Ou, Winston ; Bi, Alice Wen-Hsin ; Yang, Carol ; Chen, Bryan Y ; Pai, Rong-Wei ; Yeh, Po-Yen ; Chiang, Jo-Peng ; Phoann, Iu-Tshian ; Chang, Winnie ; Cui, Chenxuan ; Chen, Noel ; Shi, Jiatong</creatorcontrib><description>Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role.</description><identifier>DOI: 10.48550/arxiv.2312.06668</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Sound</subject><creationdate>2023-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2312.06668$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2312.06668$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chou, Yi-Hui</creatorcontrib><creatorcontrib>Chang, Kalvin</creatorcontrib><creatorcontrib>Wu, Meng-Ju</creatorcontrib><creatorcontrib>Ou, Winston</creatorcontrib><creatorcontrib>Bi, Alice Wen-Hsin</creatorcontrib><creatorcontrib>Yang, Carol</creatorcontrib><creatorcontrib>Chen, Bryan Y</creatorcontrib><creatorcontrib>Pai, Rong-Wei</creatorcontrib><creatorcontrib>Yeh, Po-Yen</creatorcontrib><creatorcontrib>Chiang, Jo-Peng</creatorcontrib><creatorcontrib>Phoann, Iu-Tshian</creatorcontrib><creatorcontrib>Chang, Winnie</creatorcontrib><creatorcontrib>Cui, Chenxuan</creatorcontrib><creatorcontrib>Chen, Noel</creatorcontrib><creatorcontrib>Shi, Jiatong</creatorcontrib><title>Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus</title><description>Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7FOwzAUQFEvHVDLBzDhH0hqO9ixxypqKVIRQ7NHz_YzWA1JZJMAf48oTHe70iHkjrPyQUvJtpC-4lKKiouSKaX0DTnsF-hn-IjDKz1jH4o8T5iWmNHT84To3ujz6LHPdBwo0BbiJwyYkR7HyyXiQJsxTXPekFWAPuPtf9ekPezb5licXh6fmt2pAFXrwnkpnQ1eC6MFlzU3LjC0xgDTzlmodQ3WBKOcCMCER6dBKCuY5UYbo6o1uf_bXiHdlOI7pO_uF9RdQdUPHz5Gbg</recordid><startdate>20231205</startdate><enddate>20231205</enddate><creator>Chou, Yi-Hui</creator><creator>Chang, Kalvin</creator><creator>Wu, Meng-Ju</creator><creator>Ou, Winston</creator><creator>Bi, Alice Wen-Hsin</creator><creator>Yang, Carol</creator><creator>Chen, Bryan Y</creator><creator>Pai, Rong-Wei</creator><creator>Yeh, Po-Yen</creator><creator>Chiang, Jo-Peng</creator><creator>Phoann, Iu-Tshian</creator><creator>Chang, Winnie</creator><creator>Cui, Chenxuan</creator><creator>Chen, Noel</creator><creator>Shi, Jiatong</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231205</creationdate><title>Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus</title><author>Chou, Yi-Hui ; Chang, Kalvin ; Wu, Meng-Ju ; Ou, Winston ; Bi, Alice Wen-Hsin ; Yang, Carol ; Chen, Bryan Y ; Pai, Rong-Wei ; Yeh, Po-Yen ; Chiang, Jo-Peng ; Phoann, Iu-Tshian ; Chang, Winnie ; Cui, Chenxuan ; Chen, Noel ; Shi, Jiatong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-cd55cbfd8298215719cf0eb99a08ccba787ab9f96c2fa02dec8a26b20b1989963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Chou, Yi-Hui</creatorcontrib><creatorcontrib>Chang, Kalvin</creatorcontrib><creatorcontrib>Wu, Meng-Ju</creatorcontrib><creatorcontrib>Ou, Winston</creatorcontrib><creatorcontrib>Bi, Alice Wen-Hsin</creatorcontrib><creatorcontrib>Yang, Carol</creatorcontrib><creatorcontrib>Chen, Bryan Y</creatorcontrib><creatorcontrib>Pai, Rong-Wei</creatorcontrib><creatorcontrib>Yeh, Po-Yen</creatorcontrib><creatorcontrib>Chiang, Jo-Peng</creatorcontrib><creatorcontrib>Phoann, Iu-Tshian</creatorcontrib><creatorcontrib>Chang, Winnie</creatorcontrib><creatorcontrib>Cui, Chenxuan</creatorcontrib><creatorcontrib>Chen, Noel</creatorcontrib><creatorcontrib>Shi, Jiatong</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chou, Yi-Hui</au><au>Chang, Kalvin</au><au>Wu, Meng-Ju</au><au>Ou, Winston</au><au>Bi, Alice Wen-Hsin</au><au>Yang, Carol</au><au>Chen, Bryan Y</au><au>Pai, Rong-Wei</au><au>Yeh, Po-Yen</au><au>Chiang, Jo-Peng</au><au>Phoann, Iu-Tshian</au><au>Chang, Winnie</au><au>Cui, Chenxuan</au><au>Chen, Noel</au><au>Shi, Jiatong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus</atitle><date>2023-12-05</date><risdate>2023</risdate><abstract>Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role.</abstract><doi>10.48550/arxiv.2312.06668</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2312.06668
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2312_06668
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Sound
title	Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T04%3A42%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20Self-supervised%20Speech%20Models%20on%20a%20Taiwanese%20Hokkien%20Corpus&rft.au=Chou,%20Yi-Hui&rft.date=2023-12-05&rft_id=info:doi/10.48550/arxiv.2312.06668&rft_dat=%3Carxiv_GOX%3E2312_06668%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true