Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-11 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Chien-yu, Huang Chen, Wei-Chih Shu-wen, Yang Liu, Andy T Chen-An, Li Yu-Xiang, Lin Wei-Cheng, Tseng Diwan, Anuj Yi-Jen Shih Shi, Jiatong Chen, William Chen, Xuanjun Chi-Yuan, Hsiao Peng, Puyuan Shih-Heng, Wang Chun-Yi, Kuan Ke-Han, Lu Kai-Wei, Chang Chih-Kai, Yang Ritter-Gutierrez, Fabian Ming To Chuang Kuan-Po Huang Arora, Siddhant You-Kuan, Lin Yeo, Eunjung Chang, Kalvin Chung-Ming, Chien Choi, Kwanghee Cheng-Hsiu Hsieh Yi-Cheng, Lin Chee-En Yu I-Hsiang, Chiu Guimarães, Heitor R Han, Jionghao Lin, Tzu-Quan Lin, Tzu-Yuan Chang, Homu Ting-Wu, Chang Chun Wei Chen Shou-Jen Chen Yu-Hua, Chen Hsi-Chun, Cheng Dhawan, Kunal Jia-Lin, Fang Shi-Xin, Fang Kuan-Yu, Fang Chiang Chi An Fu Hsien-Fu Hsiao Ching Yu Hsu Huang, Shao-Syuan Lee Chen Wei Hsi-Che Lin Hsuan-Hao, Lin Hsuan-Ting, Lin Jian-Ren, Lin Ting-Chun, Liu Li-Chun, Lu Tsung-Min Pai Pasad, Ankita Shih-Yun, Shan Kuan Shon, Suwon Tang, Yuxun Yun-Shao, Tsai Jui-Chiang, Wei Wei, Tzu-Chieh Wu, Chengxi Wu, Dien-Ruei Chao-Han, Huck Yang Chieh-Chi Yang Jia Qi Yip Shao-Xiang, Yuan Noroozi, Vahid Chen, Zhehuai Wu, Haibin Livescu, Karen Harwath, David Watanabe, Shinji Hung-yi, Lee |
description | Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3126807398</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3126807398</sourcerecordid><originalsourceid>FETCH-proquest_journals_31268073983</originalsourceid><addsrcrecordid>eNqNitFqwjAUQMNgoGz-wwWfC2kyte5tdpU9TJCpz3LV2zY2Jl1u6uYH-N_bwA_Y04Fzzp3oK63TJHtSqicGzEcppRpP1Gik--L6enF4MvtktVkWHzNY1siUqGd4gdxbizsfMJoz2QsU3y26g3EVzMjt6xOGBkofYEHIXfjzsSbIscWdsSYaYvAlrFrfkIN3dFWHFcHCH8gyfJlYQ5pJWCM3_CjuS7RMgxsfxHBerPO3pA3-syOO26PvgvtNW52qcSYneprp_10_cr9QcA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3126807398</pqid></control><display><type>article</type><title>Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks</title><source>Freely Accessible Journals</source><creator>Chien-yu, Huang ; Chen, Wei-Chih ; Shu-wen, Yang ; Liu, Andy T ; Chen-An, Li ; Yu-Xiang, Lin ; Wei-Cheng, Tseng ; Diwan, Anuj ; Yi-Jen Shih ; Shi, Jiatong ; Chen, William ; Chen, Xuanjun ; Chi-Yuan, Hsiao ; Peng, Puyuan ; Shih-Heng, Wang ; Chun-Yi, Kuan ; Ke-Han, Lu ; Kai-Wei, Chang ; Chih-Kai, Yang ; Ritter-Gutierrez, Fabian ; Ming To Chuang ; Kuan-Po Huang ; Arora, Siddhant ; You-Kuan, Lin ; Yeo, Eunjung ; Chang, Kalvin ; Chung-Ming, Chien ; Choi, Kwanghee ; Cheng-Hsiu Hsieh ; Yi-Cheng, Lin ; Chee-En Yu ; I-Hsiang, Chiu ; Guimarães, Heitor R ; Han, Jionghao ; Lin, Tzu-Quan ; Lin, Tzu-Yuan ; Chang, Homu ; Ting-Wu, Chang ; Chun Wei Chen ; Shou-Jen Chen ; Yu-Hua, Chen ; Hsi-Chun, Cheng ; Dhawan, Kunal ; Jia-Lin, Fang ; Shi-Xin, Fang ; Kuan-Yu, Fang Chiang ; Chi An Fu ; Hsien-Fu Hsiao ; Ching Yu Hsu ; Huang, Shao-Syuan ; Lee Chen Wei ; Hsi-Che Lin ; Hsuan-Hao, Lin ; Hsuan-Ting, Lin ; Jian-Ren, Lin ; Ting-Chun, Liu ; Li-Chun, Lu ; Tsung-Min Pai ; Pasad, Ankita ; Shih-Yun, Shan Kuan ; Shon, Suwon ; Tang, Yuxun ; Yun-Shao, Tsai ; Jui-Chiang, Wei ; Wei, Tzu-Chieh ; Wu, Chengxi ; Wu, Dien-Ruei ; Chao-Han, Huck Yang ; Chieh-Chi Yang ; Jia Qi Yip ; Shao-Xiang, Yuan ; Noroozi, Vahid ; Chen, Zhehuai ; Wu, Haibin ; Livescu, Karen ; Harwath, David ; Watanabe, Shinji ; Hung-yi, Lee</creator><creatorcontrib>Chien-yu, Huang ; Chen, Wei-Chih ; Shu-wen, Yang ; Liu, Andy T ; Chen-An, Li ; Yu-Xiang, Lin ; Wei-Cheng, Tseng ; Diwan, Anuj ; Yi-Jen Shih ; Shi, Jiatong ; Chen, William ; Chen, Xuanjun ; Chi-Yuan, Hsiao ; Peng, Puyuan ; Shih-Heng, Wang ; Chun-Yi, Kuan ; Ke-Han, Lu ; Kai-Wei, Chang ; Chih-Kai, Yang ; Ritter-Gutierrez, Fabian ; Ming To Chuang ; Kuan-Po Huang ; Arora, Siddhant ; You-Kuan, Lin ; Yeo, Eunjung ; Chang, Kalvin ; Chung-Ming, Chien ; Choi, Kwanghee ; Cheng-Hsiu Hsieh ; Yi-Cheng, Lin ; Chee-En Yu ; I-Hsiang, Chiu ; Guimarães, Heitor R ; Han, Jionghao ; Lin, Tzu-Quan ; Lin, Tzu-Yuan ; Chang, Homu ; Ting-Wu, Chang ; Chun Wei Chen ; Shou-Jen Chen ; Yu-Hua, Chen ; Hsi-Chun, Cheng ; Dhawan, Kunal ; Jia-Lin, Fang ; Shi-Xin, Fang ; Kuan-Yu, Fang Chiang ; Chi An Fu ; Hsien-Fu Hsiao ; Ching Yu Hsu ; Huang, Shao-Syuan ; Lee Chen Wei ; Hsi-Che Lin ; Hsuan-Hao, Lin ; Hsuan-Ting, Lin ; Jian-Ren, Lin ; Ting-Chun, Liu ; Li-Chun, Lu ; Tsung-Min Pai ; Pasad, Ankita ; Shih-Yun, Shan Kuan ; Shon, Suwon ; Tang, Yuxun ; Yun-Shao, Tsai ; Jui-Chiang, Wei ; Wei, Tzu-Chieh ; Wu, Chengxi ; Wu, Dien-Ruei ; Chao-Han, Huck Yang ; Chieh-Chi Yang ; Jia Qi Yip ; Shao-Xiang, Yuan ; Noroozi, Vahid ; Chen, Zhehuai ; Wu, Haibin ; Livescu, Karen ; Harwath, David ; Watanabe, Shinji ; Hung-yi, Lee</creatorcontrib><description>Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Audio data ; Benchmarks ; Emotion recognition ; Evaluation ; Natural language processing ; Speech</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Chien-yu, Huang</creatorcontrib><creatorcontrib>Chen, Wei-Chih</creatorcontrib><creatorcontrib>Shu-wen, Yang</creatorcontrib><creatorcontrib>Liu, Andy T</creatorcontrib><creatorcontrib>Chen-An, Li</creatorcontrib><creatorcontrib>Yu-Xiang, Lin</creatorcontrib><creatorcontrib>Wei-Cheng, Tseng</creatorcontrib><creatorcontrib>Diwan, Anuj</creatorcontrib><creatorcontrib>Yi-Jen Shih</creatorcontrib><creatorcontrib>Shi, Jiatong</creatorcontrib><creatorcontrib>Chen, William</creatorcontrib><creatorcontrib>Chen, Xuanjun</creatorcontrib><creatorcontrib>Chi-Yuan, Hsiao</creatorcontrib><creatorcontrib>Peng, Puyuan</creatorcontrib><creatorcontrib>Shih-Heng, Wang</creatorcontrib><creatorcontrib>Chun-Yi, Kuan</creatorcontrib><creatorcontrib>Ke-Han, Lu</creatorcontrib><creatorcontrib>Kai-Wei, Chang</creatorcontrib><creatorcontrib>Chih-Kai, Yang</creatorcontrib><creatorcontrib>Ritter-Gutierrez, Fabian</creatorcontrib><creatorcontrib>Ming To Chuang</creatorcontrib><creatorcontrib>Kuan-Po Huang</creatorcontrib><creatorcontrib>Arora, Siddhant</creatorcontrib><creatorcontrib>You-Kuan, Lin</creatorcontrib><creatorcontrib>Yeo, Eunjung</creatorcontrib><creatorcontrib>Chang, Kalvin</creatorcontrib><creatorcontrib>Chung-Ming, Chien</creatorcontrib><creatorcontrib>Choi, Kwanghee</creatorcontrib><creatorcontrib>Cheng-Hsiu Hsieh</creatorcontrib><creatorcontrib>Yi-Cheng, Lin</creatorcontrib><creatorcontrib>Chee-En Yu</creatorcontrib><creatorcontrib>I-Hsiang, Chiu</creatorcontrib><creatorcontrib>Guimarães, Heitor R</creatorcontrib><creatorcontrib>Han, Jionghao</creatorcontrib><creatorcontrib>Lin, Tzu-Quan</creatorcontrib><creatorcontrib>Lin, Tzu-Yuan</creatorcontrib><creatorcontrib>Chang, Homu</creatorcontrib><creatorcontrib>Ting-Wu, Chang</creatorcontrib><creatorcontrib>Chun Wei Chen</creatorcontrib><creatorcontrib>Shou-Jen Chen</creatorcontrib><creatorcontrib>Yu-Hua, Chen</creatorcontrib><creatorcontrib>Hsi-Chun, Cheng</creatorcontrib><creatorcontrib>Dhawan, Kunal</creatorcontrib><creatorcontrib>Jia-Lin, Fang</creatorcontrib><creatorcontrib>Shi-Xin, Fang</creatorcontrib><creatorcontrib>Kuan-Yu, Fang Chiang</creatorcontrib><creatorcontrib>Chi An Fu</creatorcontrib><creatorcontrib>Hsien-Fu Hsiao</creatorcontrib><creatorcontrib>Ching Yu Hsu</creatorcontrib><creatorcontrib>Huang, Shao-Syuan</creatorcontrib><creatorcontrib>Lee Chen Wei</creatorcontrib><creatorcontrib>Hsi-Che Lin</creatorcontrib><creatorcontrib>Hsuan-Hao, Lin</creatorcontrib><creatorcontrib>Hsuan-Ting, Lin</creatorcontrib><creatorcontrib>Jian-Ren, Lin</creatorcontrib><creatorcontrib>Ting-Chun, Liu</creatorcontrib><creatorcontrib>Li-Chun, Lu</creatorcontrib><creatorcontrib>Tsung-Min Pai</creatorcontrib><creatorcontrib>Pasad, Ankita</creatorcontrib><creatorcontrib>Shih-Yun, Shan Kuan</creatorcontrib><creatorcontrib>Shon, Suwon</creatorcontrib><creatorcontrib>Tang, Yuxun</creatorcontrib><creatorcontrib>Yun-Shao, Tsai</creatorcontrib><creatorcontrib>Jui-Chiang, Wei</creatorcontrib><creatorcontrib>Wei, Tzu-Chieh</creatorcontrib><creatorcontrib>Wu, Chengxi</creatorcontrib><creatorcontrib>Wu, Dien-Ruei</creatorcontrib><creatorcontrib>Chao-Han, Huck Yang</creatorcontrib><creatorcontrib>Chieh-Chi Yang</creatorcontrib><creatorcontrib>Jia Qi Yip</creatorcontrib><creatorcontrib>Shao-Xiang, Yuan</creatorcontrib><creatorcontrib>Noroozi, Vahid</creatorcontrib><creatorcontrib>Chen, Zhehuai</creatorcontrib><creatorcontrib>Wu, Haibin</creatorcontrib><creatorcontrib>Livescu, Karen</creatorcontrib><creatorcontrib>Harwath, David</creatorcontrib><creatorcontrib>Watanabe, Shinji</creatorcontrib><creatorcontrib>Hung-yi, Lee</creatorcontrib><title>Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks</title><title>arXiv.org</title><description>Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.</description><subject>Audio data</subject><subject>Benchmarks</subject><subject>Emotion recognition</subject><subject>Evaluation</subject><subject>Natural language processing</subject><subject>Speech</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNitFqwjAUQMNgoGz-wwWfC2kyte5tdpU9TJCpz3LV2zY2Jl1u6uYH-N_bwA_Y04Fzzp3oK63TJHtSqicGzEcppRpP1Gik--L6enF4MvtktVkWHzNY1siUqGd4gdxbizsfMJoz2QsU3y26g3EVzMjt6xOGBkofYEHIXfjzsSbIscWdsSYaYvAlrFrfkIN3dFWHFcHCH8gyfJlYQ5pJWCM3_CjuS7RMgxsfxHBerPO3pA3-syOO26PvgvtNW52qcSYneprp_10_cr9QcA</recordid><startdate>20241108</startdate><enddate>20241108</enddate><creator>Chien-yu, Huang</creator><creator>Chen, Wei-Chih</creator><creator>Shu-wen, Yang</creator><creator>Liu, Andy T</creator><creator>Chen-An, Li</creator><creator>Yu-Xiang, Lin</creator><creator>Wei-Cheng, Tseng</creator><creator>Diwan, Anuj</creator><creator>Yi-Jen Shih</creator><creator>Shi, Jiatong</creator><creator>Chen, William</creator><creator>Chen, Xuanjun</creator><creator>Chi-Yuan, Hsiao</creator><creator>Peng, Puyuan</creator><creator>Shih-Heng, Wang</creator><creator>Chun-Yi, Kuan</creator><creator>Ke-Han, Lu</creator><creator>Kai-Wei, Chang</creator><creator>Chih-Kai, Yang</creator><creator>Ritter-Gutierrez, Fabian</creator><creator>Ming To Chuang</creator><creator>Kuan-Po Huang</creator><creator>Arora, Siddhant</creator><creator>You-Kuan, Lin</creator><creator>Yeo, Eunjung</creator><creator>Chang, Kalvin</creator><creator>Chung-Ming, Chien</creator><creator>Choi, Kwanghee</creator><creator>Cheng-Hsiu Hsieh</creator><creator>Yi-Cheng, Lin</creator><creator>Chee-En Yu</creator><creator>I-Hsiang, Chiu</creator><creator>Guimarães, Heitor R</creator><creator>Han, Jionghao</creator><creator>Lin, Tzu-Quan</creator><creator>Lin, Tzu-Yuan</creator><creator>Chang, Homu</creator><creator>Ting-Wu, Chang</creator><creator>Chun Wei Chen</creator><creator>Shou-Jen Chen</creator><creator>Yu-Hua, Chen</creator><creator>Hsi-Chun, Cheng</creator><creator>Dhawan, Kunal</creator><creator>Jia-Lin, Fang</creator><creator>Shi-Xin, Fang</creator><creator>Kuan-Yu, Fang Chiang</creator><creator>Chi An Fu</creator><creator>Hsien-Fu Hsiao</creator><creator>Ching Yu Hsu</creator><creator>Huang, Shao-Syuan</creator><creator>Lee Chen Wei</creator><creator>Hsi-Che Lin</creator><creator>Hsuan-Hao, Lin</creator><creator>Hsuan-Ting, Lin</creator><creator>Jian-Ren, Lin</creator><creator>Ting-Chun, Liu</creator><creator>Li-Chun, Lu</creator><creator>Tsung-Min Pai</creator><creator>Pasad, Ankita</creator><creator>Shih-Yun, Shan Kuan</creator><creator>Shon, Suwon</creator><creator>Tang, Yuxun</creator><creator>Yun-Shao, Tsai</creator><creator>Jui-Chiang, Wei</creator><creator>Wei, Tzu-Chieh</creator><creator>Wu, Chengxi</creator><creator>Wu, Dien-Ruei</creator><creator>Chao-Han, Huck Yang</creator><creator>Chieh-Chi Yang</creator><creator>Jia Qi Yip</creator><creator>Shao-Xiang, Yuan</creator><creator>Noroozi, Vahid</creator><creator>Chen, Zhehuai</creator><creator>Wu, Haibin</creator><creator>Livescu, Karen</creator><creator>Harwath, David</creator><creator>Watanabe, Shinji</creator><creator>Hung-yi, Lee</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241108</creationdate><title>Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks</title><author>Chien-yu, Huang ; Chen, Wei-Chih ; Shu-wen, Yang ; Liu, Andy T ; Chen-An, Li ; Yu-Xiang, Lin ; Wei-Cheng, Tseng ; Diwan, Anuj ; Yi-Jen Shih ; Shi, Jiatong ; Chen, William ; Chen, Xuanjun ; Chi-Yuan, Hsiao ; Peng, Puyuan ; Shih-Heng, Wang ; Chun-Yi, Kuan ; Ke-Han, Lu ; Kai-Wei, Chang ; Chih-Kai, Yang ; Ritter-Gutierrez, Fabian ; Ming To Chuang ; Kuan-Po Huang ; Arora, Siddhant ; You-Kuan, Lin ; Yeo, Eunjung ; Chang, Kalvin ; Chung-Ming, Chien ; Choi, Kwanghee ; Cheng-Hsiu Hsieh ; Yi-Cheng, Lin ; Chee-En Yu ; I-Hsiang, Chiu ; Guimarães, Heitor R ; Han, Jionghao ; Lin, Tzu-Quan ; Lin, Tzu-Yuan ; Chang, Homu ; Ting-Wu, Chang ; Chun Wei Chen ; Shou-Jen Chen ; Yu-Hua, Chen ; Hsi-Chun, Cheng ; Dhawan, Kunal ; Jia-Lin, Fang ; Shi-Xin, Fang ; Kuan-Yu, Fang Chiang ; Chi An Fu ; Hsien-Fu Hsiao ; Ching Yu Hsu ; Huang, Shao-Syuan ; Lee Chen Wei ; Hsi-Che Lin ; Hsuan-Hao, Lin ; Hsuan-Ting, Lin ; Jian-Ren, Lin ; Ting-Chun, Liu ; Li-Chun, Lu ; Tsung-Min Pai ; Pasad, Ankita ; Shih-Yun, Shan Kuan ; Shon, Suwon ; Tang, Yuxun ; Yun-Shao, Tsai ; Jui-Chiang, Wei ; Wei, Tzu-Chieh ; Wu, Chengxi ; Wu, Dien-Ruei ; Chao-Han, Huck Yang ; Chieh-Chi Yang ; Jia Qi Yip ; Shao-Xiang, Yuan ; Noroozi, Vahid ; Chen, Zhehuai ; Wu, Haibin ; Livescu, Karen ; Harwath, David ; Watanabe, Shinji ; Hung-yi, Lee</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31268073983</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Audio data</topic><topic>Benchmarks</topic><topic>Emotion recognition</topic><topic>Evaluation</topic><topic>Natural language processing</topic><topic>Speech</topic><toplevel>online_resources</toplevel><creatorcontrib>Chien-yu, Huang</creatorcontrib><creatorcontrib>Chen, Wei-Chih</creatorcontrib><creatorcontrib>Shu-wen, Yang</creatorcontrib><creatorcontrib>Liu, Andy T</creatorcontrib><creatorcontrib>Chen-An, Li</creatorcontrib><creatorcontrib>Yu-Xiang, Lin</creatorcontrib><creatorcontrib>Wei-Cheng, Tseng</creatorcontrib><creatorcontrib>Diwan, Anuj</creatorcontrib><creatorcontrib>Yi-Jen Shih</creatorcontrib><creatorcontrib>Shi, Jiatong</creatorcontrib><creatorcontrib>Chen, William</creatorcontrib><creatorcontrib>Chen, Xuanjun</creatorcontrib><creatorcontrib>Chi-Yuan, Hsiao</creatorcontrib><creatorcontrib>Peng, Puyuan</creatorcontrib><creatorcontrib>Shih-Heng, Wang</creatorcontrib><creatorcontrib>Chun-Yi, Kuan</creatorcontrib><creatorcontrib>Ke-Han, Lu</creatorcontrib><creatorcontrib>Kai-Wei, Chang</creatorcontrib><creatorcontrib>Chih-Kai, Yang</creatorcontrib><creatorcontrib>Ritter-Gutierrez, Fabian</creatorcontrib><creatorcontrib>Ming To Chuang</creatorcontrib><creatorcontrib>Kuan-Po Huang</creatorcontrib><creatorcontrib>Arora, Siddhant</creatorcontrib><creatorcontrib>You-Kuan, Lin</creatorcontrib><creatorcontrib>Yeo, Eunjung</creatorcontrib><creatorcontrib>Chang, Kalvin</creatorcontrib><creatorcontrib>Chung-Ming, Chien</creatorcontrib><creatorcontrib>Choi, Kwanghee</creatorcontrib><creatorcontrib>Cheng-Hsiu Hsieh</creatorcontrib><creatorcontrib>Yi-Cheng, Lin</creatorcontrib><creatorcontrib>Chee-En Yu</creatorcontrib><creatorcontrib>I-Hsiang, Chiu</creatorcontrib><creatorcontrib>Guimarães, Heitor R</creatorcontrib><creatorcontrib>Han, Jionghao</creatorcontrib><creatorcontrib>Lin, Tzu-Quan</creatorcontrib><creatorcontrib>Lin, Tzu-Yuan</creatorcontrib><creatorcontrib>Chang, Homu</creatorcontrib><creatorcontrib>Ting-Wu, Chang</creatorcontrib><creatorcontrib>Chun Wei Chen</creatorcontrib><creatorcontrib>Shou-Jen Chen</creatorcontrib><creatorcontrib>Yu-Hua, Chen</creatorcontrib><creatorcontrib>Hsi-Chun, Cheng</creatorcontrib><creatorcontrib>Dhawan, Kunal</creatorcontrib><creatorcontrib>Jia-Lin, Fang</creatorcontrib><creatorcontrib>Shi-Xin, Fang</creatorcontrib><creatorcontrib>Kuan-Yu, Fang Chiang</creatorcontrib><creatorcontrib>Chi An Fu</creatorcontrib><creatorcontrib>Hsien-Fu Hsiao</creatorcontrib><creatorcontrib>Ching Yu Hsu</creatorcontrib><creatorcontrib>Huang, Shao-Syuan</creatorcontrib><creatorcontrib>Lee Chen Wei</creatorcontrib><creatorcontrib>Hsi-Che Lin</creatorcontrib><creatorcontrib>Hsuan-Hao, Lin</creatorcontrib><creatorcontrib>Hsuan-Ting, Lin</creatorcontrib><creatorcontrib>Jian-Ren, Lin</creatorcontrib><creatorcontrib>Ting-Chun, Liu</creatorcontrib><creatorcontrib>Li-Chun, Lu</creatorcontrib><creatorcontrib>Tsung-Min Pai</creatorcontrib><creatorcontrib>Pasad, Ankita</creatorcontrib><creatorcontrib>Shih-Yun, Shan Kuan</creatorcontrib><creatorcontrib>Shon, Suwon</creatorcontrib><creatorcontrib>Tang, Yuxun</creatorcontrib><creatorcontrib>Yun-Shao, Tsai</creatorcontrib><creatorcontrib>Jui-Chiang, Wei</creatorcontrib><creatorcontrib>Wei, Tzu-Chieh</creatorcontrib><creatorcontrib>Wu, Chengxi</creatorcontrib><creatorcontrib>Wu, Dien-Ruei</creatorcontrib><creatorcontrib>Chao-Han, Huck Yang</creatorcontrib><creatorcontrib>Chieh-Chi Yang</creatorcontrib><creatorcontrib>Jia Qi Yip</creatorcontrib><creatorcontrib>Shao-Xiang, Yuan</creatorcontrib><creatorcontrib>Noroozi, Vahid</creatorcontrib><creatorcontrib>Chen, Zhehuai</creatorcontrib><creatorcontrib>Wu, Haibin</creatorcontrib><creatorcontrib>Livescu, Karen</creatorcontrib><creatorcontrib>Harwath, David</creatorcontrib><creatorcontrib>Watanabe, Shinji</creatorcontrib><creatorcontrib>Hung-yi, Lee</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chien-yu, Huang</au><au>Chen, Wei-Chih</au><au>Shu-wen, Yang</au><au>Liu, Andy T</au><au>Chen-An, Li</au><au>Yu-Xiang, Lin</au><au>Wei-Cheng, Tseng</au><au>Diwan, Anuj</au><au>Yi-Jen Shih</au><au>Shi, Jiatong</au><au>Chen, William</au><au>Chen, Xuanjun</au><au>Chi-Yuan, Hsiao</au><au>Peng, Puyuan</au><au>Shih-Heng, Wang</au><au>Chun-Yi, Kuan</au><au>Ke-Han, Lu</au><au>Kai-Wei, Chang</au><au>Chih-Kai, Yang</au><au>Ritter-Gutierrez, Fabian</au><au>Ming To Chuang</au><au>Kuan-Po Huang</au><au>Arora, Siddhant</au><au>You-Kuan, Lin</au><au>Yeo, Eunjung</au><au>Chang, Kalvin</au><au>Chung-Ming, Chien</au><au>Choi, Kwanghee</au><au>Cheng-Hsiu Hsieh</au><au>Yi-Cheng, Lin</au><au>Chee-En Yu</au><au>I-Hsiang, Chiu</au><au>Guimarães, Heitor R</au><au>Han, Jionghao</au><au>Lin, Tzu-Quan</au><au>Lin, Tzu-Yuan</au><au>Chang, Homu</au><au>Ting-Wu, Chang</au><au>Chun Wei Chen</au><au>Shou-Jen Chen</au><au>Yu-Hua, Chen</au><au>Hsi-Chun, Cheng</au><au>Dhawan, Kunal</au><au>Jia-Lin, Fang</au><au>Shi-Xin, Fang</au><au>Kuan-Yu, Fang Chiang</au><au>Chi An Fu</au><au>Hsien-Fu Hsiao</au><au>Ching Yu Hsu</au><au>Huang, Shao-Syuan</au><au>Lee Chen Wei</au><au>Hsi-Che Lin</au><au>Hsuan-Hao, Lin</au><au>Hsuan-Ting, Lin</au><au>Jian-Ren, Lin</au><au>Ting-Chun, Liu</au><au>Li-Chun, Lu</au><au>Tsung-Min Pai</au><au>Pasad, Ankita</au><au>Shih-Yun, Shan Kuan</au><au>Shon, Suwon</au><au>Tang, Yuxun</au><au>Yun-Shao, Tsai</au><au>Jui-Chiang, Wei</au><au>Wei, Tzu-Chieh</au><au>Wu, Chengxi</au><au>Wu, Dien-Ruei</au><au>Chao-Han, Huck Yang</au><au>Chieh-Chi Yang</au><au>Jia Qi Yip</au><au>Shao-Xiang, Yuan</au><au>Noroozi, Vahid</au><au>Chen, Zhehuai</au><au>Wu, Haibin</au><au>Livescu, Karen</au><au>Harwath, David</au><au>Watanabe, Shinji</au><au>Hung-yi, Lee</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks</atitle><jtitle>arXiv.org</jtitle><date>2024-11-08</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-11 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3126807398 |
source | Freely Accessible Journals |
subjects | Audio data Benchmarks Emotion recognition Evaluation Natural language processing Speech |
title | Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T10%3A23%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Dynamic-SUPERB%20Phase-2:%20A%20Collaboratively%20Expanding%20Benchmark%20for%20Measuring%20the%20Capabilities%20of%20Spoken%20Language%20Models%20with%20180%20Tasks&rft.jtitle=arXiv.org&rft.au=Chien-yu,%20Huang&rft.date=2024-11-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3126807398%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3126807398&rft_id=info:pmid/&rfr_iscdi=true |