Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Chou, Yi-Hui, Chang, Kalvin, Wu, Meng-Ju, Ou, Winston, Bi, Alice Wen-Hsin, Yang, Carol, Chen, Bryan Y, Pai, Rong-Wei, Yeh, Po-Yen, Chiang, Jo-Peng, Phoann, Iu-Tshian, Chang, Winnie, Cui, Chenxuan, Chen, Noel, Shi, Jiatong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Chou, Yi-Hui
Chang, Kalvin
Wu, Meng-Ju
Ou, Winston
Bi, Alice Wen-Hsin
Yang, Carol
Chen, Bryan Y
Pai, Rong-Wei
Yeh, Po-Yen
Chiang, Jo-Peng
Phoann, Iu-Tshian
Chang, Winnie
Cui, Chenxuan
Chen, Noel
Shi, Jiatong
description Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role.
doi_str_mv 10.48550/arxiv.2312.06668
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2312_06668</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2312_06668</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-cd55cbfd8298215719cf0eb99a08ccba787ab9f96c2fa02dec8a26b20b1989963</originalsourceid><addsrcrecordid>eNotz7FOwzAUQFEvHVDLBzDhH0hqO9ixxypqKVIRQ7NHz_YzWA1JZJMAf48oTHe70iHkjrPyQUvJtpC-4lKKiouSKaX0DTnsF-hn-IjDKz1jH4o8T5iWmNHT84To3ujz6LHPdBwo0BbiJwyYkR7HyyXiQJsxTXPekFWAPuPtf9ekPezb5licXh6fmt2pAFXrwnkpnQ1eC6MFlzU3LjC0xgDTzlmodQ3WBKOcCMCER6dBKCuY5UYbo6o1uf_bXiHdlOI7pO_uF9RdQdUPHz5Gbg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus</title><source>arXiv.org</source><creator>Chou, Yi-Hui ; Chang, Kalvin ; Wu, Meng-Ju ; Ou, Winston ; Bi, Alice Wen-Hsin ; Yang, Carol ; Chen, Bryan Y ; Pai, Rong-Wei ; Yeh, Po-Yen ; Chiang, Jo-Peng ; Phoann, Iu-Tshian ; Chang, Winnie ; Cui, Chenxuan ; Chen, Noel ; Shi, Jiatong</creator><creatorcontrib>Chou, Yi-Hui ; Chang, Kalvin ; Wu, Meng-Ju ; Ou, Winston ; Bi, Alice Wen-Hsin ; Yang, Carol ; Chen, Bryan Y ; Pai, Rong-Wei ; Yeh, Po-Yen ; Chiang, Jo-Peng ; Phoann, Iu-Tshian ; Chang, Winnie ; Cui, Chenxuan ; Chen, Noel ; Shi, Jiatong</creatorcontrib><description>Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role.</description><identifier>DOI: 10.48550/arxiv.2312.06668</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Sound</subject><creationdate>2023-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2312.06668$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2312.06668$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chou, Yi-Hui</creatorcontrib><creatorcontrib>Chang, Kalvin</creatorcontrib><creatorcontrib>Wu, Meng-Ju</creatorcontrib><creatorcontrib>Ou, Winston</creatorcontrib><creatorcontrib>Bi, Alice Wen-Hsin</creatorcontrib><creatorcontrib>Yang, Carol</creatorcontrib><creatorcontrib>Chen, Bryan Y</creatorcontrib><creatorcontrib>Pai, Rong-Wei</creatorcontrib><creatorcontrib>Yeh, Po-Yen</creatorcontrib><creatorcontrib>Chiang, Jo-Peng</creatorcontrib><creatorcontrib>Phoann, Iu-Tshian</creatorcontrib><creatorcontrib>Chang, Winnie</creatorcontrib><creatorcontrib>Cui, Chenxuan</creatorcontrib><creatorcontrib>Chen, Noel</creatorcontrib><creatorcontrib>Shi, Jiatong</creatorcontrib><title>Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus</title><description>Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7FOwzAUQFEvHVDLBzDhH0hqO9ixxypqKVIRQ7NHz_YzWA1JZJMAf48oTHe70iHkjrPyQUvJtpC-4lKKiouSKaX0DTnsF-hn-IjDKz1jH4o8T5iWmNHT84To3ujz6LHPdBwo0BbiJwyYkR7HyyXiQJsxTXPekFWAPuPtf9ekPezb5licXh6fmt2pAFXrwnkpnQ1eC6MFlzU3LjC0xgDTzlmodQ3WBKOcCMCER6dBKCuY5UYbo6o1uf_bXiHdlOI7pO_uF9RdQdUPHz5Gbg</recordid><startdate>20231205</startdate><enddate>20231205</enddate><creator>Chou, Yi-Hui</creator><creator>Chang, Kalvin</creator><creator>Wu, Meng-Ju</creator><creator>Ou, Winston</creator><creator>Bi, Alice Wen-Hsin</creator><creator>Yang, Carol</creator><creator>Chen, Bryan Y</creator><creator>Pai, Rong-Wei</creator><creator>Yeh, Po-Yen</creator><creator>Chiang, Jo-Peng</creator><creator>Phoann, Iu-Tshian</creator><creator>Chang, Winnie</creator><creator>Cui, Chenxuan</creator><creator>Chen, Noel</creator><creator>Shi, Jiatong</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231205</creationdate><title>Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus</title><author>Chou, Yi-Hui ; Chang, Kalvin ; Wu, Meng-Ju ; Ou, Winston ; Bi, Alice Wen-Hsin ; Yang, Carol ; Chen, Bryan Y ; Pai, Rong-Wei ; Yeh, Po-Yen ; Chiang, Jo-Peng ; Phoann, Iu-Tshian ; Chang, Winnie ; Cui, Chenxuan ; Chen, Noel ; Shi, Jiatong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-cd55cbfd8298215719cf0eb99a08ccba787ab9f96c2fa02dec8a26b20b1989963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Chou, Yi-Hui</creatorcontrib><creatorcontrib>Chang, Kalvin</creatorcontrib><creatorcontrib>Wu, Meng-Ju</creatorcontrib><creatorcontrib>Ou, Winston</creatorcontrib><creatorcontrib>Bi, Alice Wen-Hsin</creatorcontrib><creatorcontrib>Yang, Carol</creatorcontrib><creatorcontrib>Chen, Bryan Y</creatorcontrib><creatorcontrib>Pai, Rong-Wei</creatorcontrib><creatorcontrib>Yeh, Po-Yen</creatorcontrib><creatorcontrib>Chiang, Jo-Peng</creatorcontrib><creatorcontrib>Phoann, Iu-Tshian</creatorcontrib><creatorcontrib>Chang, Winnie</creatorcontrib><creatorcontrib>Cui, Chenxuan</creatorcontrib><creatorcontrib>Chen, Noel</creatorcontrib><creatorcontrib>Shi, Jiatong</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chou, Yi-Hui</au><au>Chang, Kalvin</au><au>Wu, Meng-Ju</au><au>Ou, Winston</au><au>Bi, Alice Wen-Hsin</au><au>Yang, Carol</au><au>Chen, Bryan Y</au><au>Pai, Rong-Wei</au><au>Yeh, Po-Yen</au><au>Chiang, Jo-Peng</au><au>Phoann, Iu-Tshian</au><au>Chang, Winnie</au><au>Cui, Chenxuan</au><au>Chen, Noel</au><au>Shi, Jiatong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus</atitle><date>2023-12-05</date><risdate>2023</risdate><abstract>Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role.</abstract><doi>10.48550/arxiv.2312.06668</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2312.06668
ispartof
issn
language eng
recordid cdi_arxiv_primary_2312_06668
source arXiv.org
subjects Computer Science - Computation and Language
Computer Science - Sound
title Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T04%3A42%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20Self-supervised%20Speech%20Models%20on%20a%20Taiwanese%20Hokkien%20Corpus&rft.au=Chou,%20Yi-Hui&rft.date=2023-12-05&rft_id=info:doi/10.48550/arxiv.2312.06668&rft_dat=%3Carxiv_GOX%3E2312_06668%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true