Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-11
Hauptverfasser: Chien-yu, Huang, Chen, Wei-Chih, Shu-wen, Yang, Liu, Andy T, Chen-An, Li, Yu-Xiang, Lin, Wei-Cheng, Tseng, Diwan, Anuj, Yi-Jen Shih, Shi, Jiatong, Chen, William, Chen, Xuanjun, Chi-Yuan, Hsiao, Peng, Puyuan, Shih-Heng, Wang, Chun-Yi, Kuan, Ke-Han, Lu, Kai-Wei, Chang, Chih-Kai, Yang, Ritter-Gutierrez, Fabian, Ming To Chuang, Kuan-Po Huang, Arora, Siddhant, You-Kuan, Lin, Yeo, Eunjung, Chang, Kalvin, Chung-Ming, Chien, Choi, Kwanghee, Cheng-Hsiu Hsieh, Yi-Cheng, Lin, Chee-En Yu, I-Hsiang, Chiu, Guimarães, Heitor R, Han, Jionghao, Lin, Tzu-Quan, Lin, Tzu-Yuan, Chang, Homu, Ting-Wu, Chang, Chun Wei Chen, Shou-Jen Chen, Yu-Hua, Chen, Hsi-Chun, Cheng, Dhawan, Kunal, Jia-Lin, Fang, Shi-Xin, Fang, Kuan-Yu, Fang Chiang, Chi An Fu, Hsien-Fu Hsiao, Ching Yu Hsu, Huang, Shao-Syuan, Lee Chen Wei, Hsi-Che Lin, Hsuan-Hao, Lin, Hsuan-Ting, Lin, Jian-Ren, Lin, Ting-Chun, Liu, Li-Chun, Lu, Tsung-Min Pai, Pasad, Ankita, Shih-Yun, Shan Kuan, Shon, Suwon, Tang, Yuxun, Yun-Shao, Tsai, Jui-Chiang, Wei, Wei, Tzu-Chieh, Wu, Chengxi, Wu, Dien-Ruei, Chao-Han, Huck Yang, Chieh-Chi Yang, Jia Qi Yip, Shao-Xiang, Yuan, Noroozi, Vahid, Chen, Zhehuai, Wu, Haibin, Livescu, Karen, Harwath, David, Watanabe, Shinji, Hung-yi, Lee
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Chien-yu, Huang
Chen, Wei-Chih
Shu-wen, Yang
Liu, Andy T
Chen-An, Li
Yu-Xiang, Lin
Wei-Cheng, Tseng
Diwan, Anuj
Yi-Jen Shih
Shi, Jiatong
Chen, William
Chen, Xuanjun
Chi-Yuan, Hsiao
Peng, Puyuan
Shih-Heng, Wang
Chun-Yi, Kuan
Ke-Han, Lu
Kai-Wei, Chang
Chih-Kai, Yang
Ritter-Gutierrez, Fabian
Ming To Chuang
Kuan-Po Huang
Arora, Siddhant
You-Kuan, Lin
Yeo, Eunjung
Chang, Kalvin
Chung-Ming, Chien
Choi, Kwanghee
Cheng-Hsiu Hsieh
Yi-Cheng, Lin
Chee-En Yu
I-Hsiang, Chiu
Guimarães, Heitor R
Han, Jionghao
Lin, Tzu-Quan
Lin, Tzu-Yuan
Chang, Homu
Ting-Wu, Chang
Chun Wei Chen
Shou-Jen Chen
Yu-Hua, Chen
Hsi-Chun, Cheng
Dhawan, Kunal
Jia-Lin, Fang
Shi-Xin, Fang
Kuan-Yu, Fang Chiang
Chi An Fu
Hsien-Fu Hsiao
Ching Yu Hsu
Huang, Shao-Syuan
Lee Chen Wei
Hsi-Che Lin
Hsuan-Hao, Lin
Hsuan-Ting, Lin
Jian-Ren, Lin
Ting-Chun, Liu
Li-Chun, Lu
Tsung-Min Pai
Pasad, Ankita
Shih-Yun, Shan Kuan
Shon, Suwon
Tang, Yuxun
Yun-Shao, Tsai
Jui-Chiang, Wei
Wei, Tzu-Chieh
Wu, Chengxi
Wu, Dien-Ruei
Chao-Han, Huck Yang
Chieh-Chi Yang
Jia Qi Yip
Shao-Xiang, Yuan
Noroozi, Vahid
Chen, Zhehuai
Wu, Haibin
Livescu, Karen
Harwath, David
Watanabe, Shinji
Hung-yi, Lee
description Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3126807398</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3126807398</sourcerecordid><originalsourceid>FETCH-proquest_journals_31268073983</originalsourceid><addsrcrecordid>eNqNitFqwjAUQMNgoGz-wwWfC2kyte5tdpU9TJCpz3LV2zY2Jl1u6uYH-N_bwA_Y04Fzzp3oK63TJHtSqicGzEcppRpP1Gik--L6enF4MvtktVkWHzNY1siUqGd4gdxbizsfMJoz2QsU3y26g3EVzMjt6xOGBkofYEHIXfjzsSbIscWdsSYaYvAlrFrfkIN3dFWHFcHCH8gyfJlYQ5pJWCM3_CjuS7RMgxsfxHBerPO3pA3-syOO26PvgvtNW52qcSYneprp_10_cr9QcA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3126807398</pqid></control><display><type>article</type><title>Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks</title><source>Freely Accessible Journals</source><creator>Chien-yu, Huang ; Chen, Wei-Chih ; Shu-wen, Yang ; Liu, Andy T ; Chen-An, Li ; Yu-Xiang, Lin ; Wei-Cheng, Tseng ; Diwan, Anuj ; Yi-Jen Shih ; Shi, Jiatong ; Chen, William ; Chen, Xuanjun ; Chi-Yuan, Hsiao ; Peng, Puyuan ; Shih-Heng, Wang ; Chun-Yi, Kuan ; Ke-Han, Lu ; Kai-Wei, Chang ; Chih-Kai, Yang ; Ritter-Gutierrez, Fabian ; Ming To Chuang ; Kuan-Po Huang ; Arora, Siddhant ; You-Kuan, Lin ; Yeo, Eunjung ; Chang, Kalvin ; Chung-Ming, Chien ; Choi, Kwanghee ; Cheng-Hsiu Hsieh ; Yi-Cheng, Lin ; Chee-En Yu ; I-Hsiang, Chiu ; Guimarães, Heitor R ; Han, Jionghao ; Lin, Tzu-Quan ; Lin, Tzu-Yuan ; Chang, Homu ; Ting-Wu, Chang ; Chun Wei Chen ; Shou-Jen Chen ; Yu-Hua, Chen ; Hsi-Chun, Cheng ; Dhawan, Kunal ; Jia-Lin, Fang ; Shi-Xin, Fang ; Kuan-Yu, Fang Chiang ; Chi An Fu ; Hsien-Fu Hsiao ; Ching Yu Hsu ; Huang, Shao-Syuan ; Lee Chen Wei ; Hsi-Che Lin ; Hsuan-Hao, Lin ; Hsuan-Ting, Lin ; Jian-Ren, Lin ; Ting-Chun, Liu ; Li-Chun, Lu ; Tsung-Min Pai ; Pasad, Ankita ; Shih-Yun, Shan Kuan ; Shon, Suwon ; Tang, Yuxun ; Yun-Shao, Tsai ; Jui-Chiang, Wei ; Wei, Tzu-Chieh ; Wu, Chengxi ; Wu, Dien-Ruei ; Chao-Han, Huck Yang ; Chieh-Chi Yang ; Jia Qi Yip ; Shao-Xiang, Yuan ; Noroozi, Vahid ; Chen, Zhehuai ; Wu, Haibin ; Livescu, Karen ; Harwath, David ; Watanabe, Shinji ; Hung-yi, Lee</creator><creatorcontrib>Chien-yu, Huang ; Chen, Wei-Chih ; Shu-wen, Yang ; Liu, Andy T ; Chen-An, Li ; Yu-Xiang, Lin ; Wei-Cheng, Tseng ; Diwan, Anuj ; Yi-Jen Shih ; Shi, Jiatong ; Chen, William ; Chen, Xuanjun ; Chi-Yuan, Hsiao ; Peng, Puyuan ; Shih-Heng, Wang ; Chun-Yi, Kuan ; Ke-Han, Lu ; Kai-Wei, Chang ; Chih-Kai, Yang ; Ritter-Gutierrez, Fabian ; Ming To Chuang ; Kuan-Po Huang ; Arora, Siddhant ; You-Kuan, Lin ; Yeo, Eunjung ; Chang, Kalvin ; Chung-Ming, Chien ; Choi, Kwanghee ; Cheng-Hsiu Hsieh ; Yi-Cheng, Lin ; Chee-En Yu ; I-Hsiang, Chiu ; Guimarães, Heitor R ; Han, Jionghao ; Lin, Tzu-Quan ; Lin, Tzu-Yuan ; Chang, Homu ; Ting-Wu, Chang ; Chun Wei Chen ; Shou-Jen Chen ; Yu-Hua, Chen ; Hsi-Chun, Cheng ; Dhawan, Kunal ; Jia-Lin, Fang ; Shi-Xin, Fang ; Kuan-Yu, Fang Chiang ; Chi An Fu ; Hsien-Fu Hsiao ; Ching Yu Hsu ; Huang, Shao-Syuan ; Lee Chen Wei ; Hsi-Che Lin ; Hsuan-Hao, Lin ; Hsuan-Ting, Lin ; Jian-Ren, Lin ; Ting-Chun, Liu ; Li-Chun, Lu ; Tsung-Min Pai ; Pasad, Ankita ; Shih-Yun, Shan Kuan ; Shon, Suwon ; Tang, Yuxun ; Yun-Shao, Tsai ; Jui-Chiang, Wei ; Wei, Tzu-Chieh ; Wu, Chengxi ; Wu, Dien-Ruei ; Chao-Han, Huck Yang ; Chieh-Chi Yang ; Jia Qi Yip ; Shao-Xiang, Yuan ; Noroozi, Vahid ; Chen, Zhehuai ; Wu, Haibin ; Livescu, Karen ; Harwath, David ; Watanabe, Shinji ; Hung-yi, Lee</creatorcontrib><description>Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Audio data ; Benchmarks ; Emotion recognition ; Evaluation ; Natural language processing ; Speech</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Chien-yu, Huang</creatorcontrib><creatorcontrib>Chen, Wei-Chih</creatorcontrib><creatorcontrib>Shu-wen, Yang</creatorcontrib><creatorcontrib>Liu, Andy T</creatorcontrib><creatorcontrib>Chen-An, Li</creatorcontrib><creatorcontrib>Yu-Xiang, Lin</creatorcontrib><creatorcontrib>Wei-Cheng, Tseng</creatorcontrib><creatorcontrib>Diwan, Anuj</creatorcontrib><creatorcontrib>Yi-Jen Shih</creatorcontrib><creatorcontrib>Shi, Jiatong</creatorcontrib><creatorcontrib>Chen, William</creatorcontrib><creatorcontrib>Chen, Xuanjun</creatorcontrib><creatorcontrib>Chi-Yuan, Hsiao</creatorcontrib><creatorcontrib>Peng, Puyuan</creatorcontrib><creatorcontrib>Shih-Heng, Wang</creatorcontrib><creatorcontrib>Chun-Yi, Kuan</creatorcontrib><creatorcontrib>Ke-Han, Lu</creatorcontrib><creatorcontrib>Kai-Wei, Chang</creatorcontrib><creatorcontrib>Chih-Kai, Yang</creatorcontrib><creatorcontrib>Ritter-Gutierrez, Fabian</creatorcontrib><creatorcontrib>Ming To Chuang</creatorcontrib><creatorcontrib>Kuan-Po Huang</creatorcontrib><creatorcontrib>Arora, Siddhant</creatorcontrib><creatorcontrib>You-Kuan, Lin</creatorcontrib><creatorcontrib>Yeo, Eunjung</creatorcontrib><creatorcontrib>Chang, Kalvin</creatorcontrib><creatorcontrib>Chung-Ming, Chien</creatorcontrib><creatorcontrib>Choi, Kwanghee</creatorcontrib><creatorcontrib>Cheng-Hsiu Hsieh</creatorcontrib><creatorcontrib>Yi-Cheng, Lin</creatorcontrib><creatorcontrib>Chee-En Yu</creatorcontrib><creatorcontrib>I-Hsiang, Chiu</creatorcontrib><creatorcontrib>Guimarães, Heitor R</creatorcontrib><creatorcontrib>Han, Jionghao</creatorcontrib><creatorcontrib>Lin, Tzu-Quan</creatorcontrib><creatorcontrib>Lin, Tzu-Yuan</creatorcontrib><creatorcontrib>Chang, Homu</creatorcontrib><creatorcontrib>Ting-Wu, Chang</creatorcontrib><creatorcontrib>Chun Wei Chen</creatorcontrib><creatorcontrib>Shou-Jen Chen</creatorcontrib><creatorcontrib>Yu-Hua, Chen</creatorcontrib><creatorcontrib>Hsi-Chun, Cheng</creatorcontrib><creatorcontrib>Dhawan, Kunal</creatorcontrib><creatorcontrib>Jia-Lin, Fang</creatorcontrib><creatorcontrib>Shi-Xin, Fang</creatorcontrib><creatorcontrib>Kuan-Yu, Fang Chiang</creatorcontrib><creatorcontrib>Chi An Fu</creatorcontrib><creatorcontrib>Hsien-Fu Hsiao</creatorcontrib><creatorcontrib>Ching Yu Hsu</creatorcontrib><creatorcontrib>Huang, Shao-Syuan</creatorcontrib><creatorcontrib>Lee Chen Wei</creatorcontrib><creatorcontrib>Hsi-Che Lin</creatorcontrib><creatorcontrib>Hsuan-Hao, Lin</creatorcontrib><creatorcontrib>Hsuan-Ting, Lin</creatorcontrib><creatorcontrib>Jian-Ren, Lin</creatorcontrib><creatorcontrib>Ting-Chun, Liu</creatorcontrib><creatorcontrib>Li-Chun, Lu</creatorcontrib><creatorcontrib>Tsung-Min Pai</creatorcontrib><creatorcontrib>Pasad, Ankita</creatorcontrib><creatorcontrib>Shih-Yun, Shan Kuan</creatorcontrib><creatorcontrib>Shon, Suwon</creatorcontrib><creatorcontrib>Tang, Yuxun</creatorcontrib><creatorcontrib>Yun-Shao, Tsai</creatorcontrib><creatorcontrib>Jui-Chiang, Wei</creatorcontrib><creatorcontrib>Wei, Tzu-Chieh</creatorcontrib><creatorcontrib>Wu, Chengxi</creatorcontrib><creatorcontrib>Wu, Dien-Ruei</creatorcontrib><creatorcontrib>Chao-Han, Huck Yang</creatorcontrib><creatorcontrib>Chieh-Chi Yang</creatorcontrib><creatorcontrib>Jia Qi Yip</creatorcontrib><creatorcontrib>Shao-Xiang, Yuan</creatorcontrib><creatorcontrib>Noroozi, Vahid</creatorcontrib><creatorcontrib>Chen, Zhehuai</creatorcontrib><creatorcontrib>Wu, Haibin</creatorcontrib><creatorcontrib>Livescu, Karen</creatorcontrib><creatorcontrib>Harwath, David</creatorcontrib><creatorcontrib>Watanabe, Shinji</creatorcontrib><creatorcontrib>Hung-yi, Lee</creatorcontrib><title>Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks</title><title>arXiv.org</title><description>Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.</description><subject>Audio data</subject><subject>Benchmarks</subject><subject>Emotion recognition</subject><subject>Evaluation</subject><subject>Natural language processing</subject><subject>Speech</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNitFqwjAUQMNgoGz-wwWfC2kyte5tdpU9TJCpz3LV2zY2Jl1u6uYH-N_bwA_Y04Fzzp3oK63TJHtSqicGzEcppRpP1Gik--L6enF4MvtktVkWHzNY1siUqGd4gdxbizsfMJoz2QsU3y26g3EVzMjt6xOGBkofYEHIXfjzsSbIscWdsSYaYvAlrFrfkIN3dFWHFcHCH8gyfJlYQ5pJWCM3_CjuS7RMgxsfxHBerPO3pA3-syOO26PvgvtNW52qcSYneprp_10_cr9QcA</recordid><startdate>20241108</startdate><enddate>20241108</enddate><creator>Chien-yu, Huang</creator><creator>Chen, Wei-Chih</creator><creator>Shu-wen, Yang</creator><creator>Liu, Andy T</creator><creator>Chen-An, Li</creator><creator>Yu-Xiang, Lin</creator><creator>Wei-Cheng, Tseng</creator><creator>Diwan, Anuj</creator><creator>Yi-Jen Shih</creator><creator>Shi, Jiatong</creator><creator>Chen, William</creator><creator>Chen, Xuanjun</creator><creator>Chi-Yuan, Hsiao</creator><creator>Peng, Puyuan</creator><creator>Shih-Heng, Wang</creator><creator>Chun-Yi, Kuan</creator><creator>Ke-Han, Lu</creator><creator>Kai-Wei, Chang</creator><creator>Chih-Kai, Yang</creator><creator>Ritter-Gutierrez, Fabian</creator><creator>Ming To Chuang</creator><creator>Kuan-Po Huang</creator><creator>Arora, Siddhant</creator><creator>You-Kuan, Lin</creator><creator>Yeo, Eunjung</creator><creator>Chang, Kalvin</creator><creator>Chung-Ming, Chien</creator><creator>Choi, Kwanghee</creator><creator>Cheng-Hsiu Hsieh</creator><creator>Yi-Cheng, Lin</creator><creator>Chee-En Yu</creator><creator>I-Hsiang, Chiu</creator><creator>Guimarães, Heitor R</creator><creator>Han, Jionghao</creator><creator>Lin, Tzu-Quan</creator><creator>Lin, Tzu-Yuan</creator><creator>Chang, Homu</creator><creator>Ting-Wu, Chang</creator><creator>Chun Wei Chen</creator><creator>Shou-Jen Chen</creator><creator>Yu-Hua, Chen</creator><creator>Hsi-Chun, Cheng</creator><creator>Dhawan, Kunal</creator><creator>Jia-Lin, Fang</creator><creator>Shi-Xin, Fang</creator><creator>Kuan-Yu, Fang Chiang</creator><creator>Chi An Fu</creator><creator>Hsien-Fu Hsiao</creator><creator>Ching Yu Hsu</creator><creator>Huang, Shao-Syuan</creator><creator>Lee Chen Wei</creator><creator>Hsi-Che Lin</creator><creator>Hsuan-Hao, Lin</creator><creator>Hsuan-Ting, Lin</creator><creator>Jian-Ren, Lin</creator><creator>Ting-Chun, Liu</creator><creator>Li-Chun, Lu</creator><creator>Tsung-Min Pai</creator><creator>Pasad, Ankita</creator><creator>Shih-Yun, Shan Kuan</creator><creator>Shon, Suwon</creator><creator>Tang, Yuxun</creator><creator>Yun-Shao, Tsai</creator><creator>Jui-Chiang, Wei</creator><creator>Wei, Tzu-Chieh</creator><creator>Wu, Chengxi</creator><creator>Wu, Dien-Ruei</creator><creator>Chao-Han, Huck Yang</creator><creator>Chieh-Chi Yang</creator><creator>Jia Qi Yip</creator><creator>Shao-Xiang, Yuan</creator><creator>Noroozi, Vahid</creator><creator>Chen, Zhehuai</creator><creator>Wu, Haibin</creator><creator>Livescu, Karen</creator><creator>Harwath, David</creator><creator>Watanabe, Shinji</creator><creator>Hung-yi, Lee</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241108</creationdate><title>Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks</title><author>Chien-yu, Huang ; Chen, Wei-Chih ; Shu-wen, Yang ; Liu, Andy T ; Chen-An, Li ; Yu-Xiang, Lin ; Wei-Cheng, Tseng ; Diwan, Anuj ; Yi-Jen Shih ; Shi, Jiatong ; Chen, William ; Chen, Xuanjun ; Chi-Yuan, Hsiao ; Peng, Puyuan ; Shih-Heng, Wang ; Chun-Yi, Kuan ; Ke-Han, Lu ; Kai-Wei, Chang ; Chih-Kai, Yang ; Ritter-Gutierrez, Fabian ; Ming To Chuang ; Kuan-Po Huang ; Arora, Siddhant ; You-Kuan, Lin ; Yeo, Eunjung ; Chang, Kalvin ; Chung-Ming, Chien ; Choi, Kwanghee ; Cheng-Hsiu Hsieh ; Yi-Cheng, Lin ; Chee-En Yu ; I-Hsiang, Chiu ; Guimarães, Heitor R ; Han, Jionghao ; Lin, Tzu-Quan ; Lin, Tzu-Yuan ; Chang, Homu ; Ting-Wu, Chang ; Chun Wei Chen ; Shou-Jen Chen ; Yu-Hua, Chen ; Hsi-Chun, Cheng ; Dhawan, Kunal ; Jia-Lin, Fang ; Shi-Xin, Fang ; Kuan-Yu, Fang Chiang ; Chi An Fu ; Hsien-Fu Hsiao ; Ching Yu Hsu ; Huang, Shao-Syuan ; Lee Chen Wei ; Hsi-Che Lin ; Hsuan-Hao, Lin ; Hsuan-Ting, Lin ; Jian-Ren, Lin ; Ting-Chun, Liu ; Li-Chun, Lu ; Tsung-Min Pai ; Pasad, Ankita ; Shih-Yun, Shan Kuan ; Shon, Suwon ; Tang, Yuxun ; Yun-Shao, Tsai ; Jui-Chiang, Wei ; Wei, Tzu-Chieh ; Wu, Chengxi ; Wu, Dien-Ruei ; Chao-Han, Huck Yang ; Chieh-Chi Yang ; Jia Qi Yip ; Shao-Xiang, Yuan ; Noroozi, Vahid ; Chen, Zhehuai ; Wu, Haibin ; Livescu, Karen ; Harwath, David ; Watanabe, Shinji ; Hung-yi, Lee</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31268073983</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Audio data</topic><topic>Benchmarks</topic><topic>Emotion recognition</topic><topic>Evaluation</topic><topic>Natural language processing</topic><topic>Speech</topic><toplevel>online_resources</toplevel><creatorcontrib>Chien-yu, Huang</creatorcontrib><creatorcontrib>Chen, Wei-Chih</creatorcontrib><creatorcontrib>Shu-wen, Yang</creatorcontrib><creatorcontrib>Liu, Andy T</creatorcontrib><creatorcontrib>Chen-An, Li</creatorcontrib><creatorcontrib>Yu-Xiang, Lin</creatorcontrib><creatorcontrib>Wei-Cheng, Tseng</creatorcontrib><creatorcontrib>Diwan, Anuj</creatorcontrib><creatorcontrib>Yi-Jen Shih</creatorcontrib><creatorcontrib>Shi, Jiatong</creatorcontrib><creatorcontrib>Chen, William</creatorcontrib><creatorcontrib>Chen, Xuanjun</creatorcontrib><creatorcontrib>Chi-Yuan, Hsiao</creatorcontrib><creatorcontrib>Peng, Puyuan</creatorcontrib><creatorcontrib>Shih-Heng, Wang</creatorcontrib><creatorcontrib>Chun-Yi, Kuan</creatorcontrib><creatorcontrib>Ke-Han, Lu</creatorcontrib><creatorcontrib>Kai-Wei, Chang</creatorcontrib><creatorcontrib>Chih-Kai, Yang</creatorcontrib><creatorcontrib>Ritter-Gutierrez, Fabian</creatorcontrib><creatorcontrib>Ming To Chuang</creatorcontrib><creatorcontrib>Kuan-Po Huang</creatorcontrib><creatorcontrib>Arora, Siddhant</creatorcontrib><creatorcontrib>You-Kuan, Lin</creatorcontrib><creatorcontrib>Yeo, Eunjung</creatorcontrib><creatorcontrib>Chang, Kalvin</creatorcontrib><creatorcontrib>Chung-Ming, Chien</creatorcontrib><creatorcontrib>Choi, Kwanghee</creatorcontrib><creatorcontrib>Cheng-Hsiu Hsieh</creatorcontrib><creatorcontrib>Yi-Cheng, Lin</creatorcontrib><creatorcontrib>Chee-En Yu</creatorcontrib><creatorcontrib>I-Hsiang, Chiu</creatorcontrib><creatorcontrib>Guimarães, Heitor R</creatorcontrib><creatorcontrib>Han, Jionghao</creatorcontrib><creatorcontrib>Lin, Tzu-Quan</creatorcontrib><creatorcontrib>Lin, Tzu-Yuan</creatorcontrib><creatorcontrib>Chang, Homu</creatorcontrib><creatorcontrib>Ting-Wu, Chang</creatorcontrib><creatorcontrib>Chun Wei Chen</creatorcontrib><creatorcontrib>Shou-Jen Chen</creatorcontrib><creatorcontrib>Yu-Hua, Chen</creatorcontrib><creatorcontrib>Hsi-Chun, Cheng</creatorcontrib><creatorcontrib>Dhawan, Kunal</creatorcontrib><creatorcontrib>Jia-Lin, Fang</creatorcontrib><creatorcontrib>Shi-Xin, Fang</creatorcontrib><creatorcontrib>Kuan-Yu, Fang Chiang</creatorcontrib><creatorcontrib>Chi An Fu</creatorcontrib><creatorcontrib>Hsien-Fu Hsiao</creatorcontrib><creatorcontrib>Ching Yu Hsu</creatorcontrib><creatorcontrib>Huang, Shao-Syuan</creatorcontrib><creatorcontrib>Lee Chen Wei</creatorcontrib><creatorcontrib>Hsi-Che Lin</creatorcontrib><creatorcontrib>Hsuan-Hao, Lin</creatorcontrib><creatorcontrib>Hsuan-Ting, Lin</creatorcontrib><creatorcontrib>Jian-Ren, Lin</creatorcontrib><creatorcontrib>Ting-Chun, Liu</creatorcontrib><creatorcontrib>Li-Chun, Lu</creatorcontrib><creatorcontrib>Tsung-Min Pai</creatorcontrib><creatorcontrib>Pasad, Ankita</creatorcontrib><creatorcontrib>Shih-Yun, Shan Kuan</creatorcontrib><creatorcontrib>Shon, Suwon</creatorcontrib><creatorcontrib>Tang, Yuxun</creatorcontrib><creatorcontrib>Yun-Shao, Tsai</creatorcontrib><creatorcontrib>Jui-Chiang, Wei</creatorcontrib><creatorcontrib>Wei, Tzu-Chieh</creatorcontrib><creatorcontrib>Wu, Chengxi</creatorcontrib><creatorcontrib>Wu, Dien-Ruei</creatorcontrib><creatorcontrib>Chao-Han, Huck Yang</creatorcontrib><creatorcontrib>Chieh-Chi Yang</creatorcontrib><creatorcontrib>Jia Qi Yip</creatorcontrib><creatorcontrib>Shao-Xiang, Yuan</creatorcontrib><creatorcontrib>Noroozi, Vahid</creatorcontrib><creatorcontrib>Chen, Zhehuai</creatorcontrib><creatorcontrib>Wu, Haibin</creatorcontrib><creatorcontrib>Livescu, Karen</creatorcontrib><creatorcontrib>Harwath, David</creatorcontrib><creatorcontrib>Watanabe, Shinji</creatorcontrib><creatorcontrib>Hung-yi, Lee</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chien-yu, Huang</au><au>Chen, Wei-Chih</au><au>Shu-wen, Yang</au><au>Liu, Andy T</au><au>Chen-An, Li</au><au>Yu-Xiang, Lin</au><au>Wei-Cheng, Tseng</au><au>Diwan, Anuj</au><au>Yi-Jen Shih</au><au>Shi, Jiatong</au><au>Chen, William</au><au>Chen, Xuanjun</au><au>Chi-Yuan, Hsiao</au><au>Peng, Puyuan</au><au>Shih-Heng, Wang</au><au>Chun-Yi, Kuan</au><au>Ke-Han, Lu</au><au>Kai-Wei, Chang</au><au>Chih-Kai, Yang</au><au>Ritter-Gutierrez, Fabian</au><au>Ming To Chuang</au><au>Kuan-Po Huang</au><au>Arora, Siddhant</au><au>You-Kuan, Lin</au><au>Yeo, Eunjung</au><au>Chang, Kalvin</au><au>Chung-Ming, Chien</au><au>Choi, Kwanghee</au><au>Cheng-Hsiu Hsieh</au><au>Yi-Cheng, Lin</au><au>Chee-En Yu</au><au>I-Hsiang, Chiu</au><au>Guimarães, Heitor R</au><au>Han, Jionghao</au><au>Lin, Tzu-Quan</au><au>Lin, Tzu-Yuan</au><au>Chang, Homu</au><au>Ting-Wu, Chang</au><au>Chun Wei Chen</au><au>Shou-Jen Chen</au><au>Yu-Hua, Chen</au><au>Hsi-Chun, Cheng</au><au>Dhawan, Kunal</au><au>Jia-Lin, Fang</au><au>Shi-Xin, Fang</au><au>Kuan-Yu, Fang Chiang</au><au>Chi An Fu</au><au>Hsien-Fu Hsiao</au><au>Ching Yu Hsu</au><au>Huang, Shao-Syuan</au><au>Lee Chen Wei</au><au>Hsi-Che Lin</au><au>Hsuan-Hao, Lin</au><au>Hsuan-Ting, Lin</au><au>Jian-Ren, Lin</au><au>Ting-Chun, Liu</au><au>Li-Chun, Lu</au><au>Tsung-Min Pai</au><au>Pasad, Ankita</au><au>Shih-Yun, Shan Kuan</au><au>Shon, Suwon</au><au>Tang, Yuxun</au><au>Yun-Shao, Tsai</au><au>Jui-Chiang, Wei</au><au>Wei, Tzu-Chieh</au><au>Wu, Chengxi</au><au>Wu, Dien-Ruei</au><au>Chao-Han, Huck Yang</au><au>Chieh-Chi Yang</au><au>Jia Qi Yip</au><au>Shao-Xiang, Yuan</au><au>Noroozi, Vahid</au><au>Chen, Zhehuai</au><au>Wu, Haibin</au><au>Livescu, Karen</au><au>Harwath, David</au><au>Watanabe, Shinji</au><au>Hung-yi, Lee</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks</atitle><jtitle>arXiv.org</jtitle><date>2024-11-08</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_3126807398
source Freely Accessible Journals
subjects Audio data
Benchmarks
Emotion recognition
Evaluation
Natural language processing
Speech
title Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T10%3A23%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Dynamic-SUPERB%20Phase-2:%20A%20Collaboratively%20Expanding%20Benchmark%20for%20Measuring%20the%20Capabilities%20of%20Spoken%20Language%20Models%20with%20180%20Tasks&rft.jtitle=arXiv.org&rft.au=Chien-yu,%20Huang&rft.date=2024-11-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3126807398%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3126807398&rft_id=info:pmid/&rfr_iscdi=true