1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those somet...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chen, Mingjie, Zhang, Hezhao, Li, Yuanchao, Luo, Jiachen, Wu, Wen, Ma, Ziyang, Bell, Peter, Lai, Catherine, Reiss, Joshua, Wang, Lin, Woodland, Philip C, Chen, Xie, Phan, Huy, Hain, Thomas
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Chen, Mingjie Zhang, Hezhao Li, Yuanchao Luo, Jiachen Wu, Wen Ma, Ziyang Bell, Peter Lai, Catherine Reiss, Joshua Wang, Lin Woodland, Philip C Chen, Xie Phan, Huy Hain, Thomas
description	Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for training, but problems remain as it sometimes causes over-fitting for minor classes or under-fitting for major classes. This paper presents the system developed by a multi-site team for the participation in the Odyssey 2024 Emotion Recognition Challenge Track-1. The challenge data has the aforementioned properties and therefore the presented systems aimed to tackle these issues, by introducing focal loss in optimisation when applying class weighted loss. Specifically, the focal loss is further weighted by prior-based class weights. Experimental results show that combining these two approaches brings better overall performance, by sacrificing performance on major classes. The system further employs a majority voting strategy to combine the outputs of an ensemble of 7 models. The models are trained independently, using different acoustic features and loss functions - with the aim to have different properties for different data. Hence these models show different performance preferences on major classes and minor classes. The ensemble system output obtained the best performance in the challenge, ranking top-1 among 68 submissions. It also outperformed all single models in our set. On the Odyssey 2024 Emotion Recognition Challenge Task-1 data the system obtained a Macro-F1 score of 35.69% and an accuracy of 37.32%.
doi_str_mv	10.48550/arxiv.2405.20064
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2405_20064</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2405_20064</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-5de5632ffdf2d91101ea2e819f92fb8c14edd38d052a44721c50bf2dafc742323</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_QILt2Hl0h6IClSq1guyjG_s6jerEKA6I_D0hZTWj0ehIh5AHzmKZK8WeYPzpvmMhmYoFY6m8JZqHiZ4caKQf3n1NnR_o5OnRzCHgTHe9X6d31L4durWXZ3AOhxZpBeHCt0voi-uGlpYOQqD7vgEHw0I8jb5x2N-RGwsu4P1_bkj1sqvKt-hwfN2Xz4cI0kxGyqBKE2GtscIUnDOOIDDnhS2EbXLNJRqT5IYpAVJmgmvFmuUKVmdSJCLZkMcrdrWsP8euh3Gu_2zr1Tb5BZXbUCY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem</title><source>arXiv.org</source><creator>Chen, Mingjie ; Zhang, Hezhao ; Li, Yuanchao ; Luo, Jiachen ; Wu, Wen ; Ma, Ziyang ; Bell, Peter ; Lai, Catherine ; Reiss, Joshua ; Wang, Lin ; Woodland, Philip C ; Chen, Xie ; Phan, Huy ; Hain, Thomas</creator><creatorcontrib>Chen, Mingjie ; Zhang, Hezhao ; Li, Yuanchao ; Luo, Jiachen ; Wu, Wen ; Ma, Ziyang ; Bell, Peter ; Lai, Catherine ; Reiss, Joshua ; Wang, Lin ; Woodland, Philip C ; Chen, Xie ; Phan, Huy ; Hain, Thomas</creatorcontrib><description>Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for training, but problems remain as it sometimes causes over-fitting for minor classes or under-fitting for major classes. This paper presents the system developed by a multi-site team for the participation in the Odyssey 2024 Emotion Recognition Challenge Track-1. The challenge data has the aforementioned properties and therefore the presented systems aimed to tackle these issues, by introducing focal loss in optimisation when applying class weighted loss. Specifically, the focal loss is further weighted by prior-based class weights. Experimental results show that combining these two approaches brings better overall performance, by sacrificing performance on major classes. The system further employs a majority voting strategy to combine the outputs of an ensemble of 7 models. The models are trained independently, using different acoustic features and loss functions - with the aim to have different properties for different data. Hence these models show different performance preferences on major classes and minor classes. The ensemble system output obtained the best performance in the challenge, ranking top-1 among 68 submissions. It also outperformed all single models in our set. On the Odyssey 2024 Emotion Recognition Challenge Task-1 data the system obtained a Macro-F1 score of 35.69% and an accuracy of 37.32%.</description><identifier>DOI: 10.48550/arxiv.2405.20064</identifier><language>eng</language><subject>Computer Science - Sound</subject><creationdate>2024-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2405.20064$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2405.20064$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Mingjie</creatorcontrib><creatorcontrib>Zhang, Hezhao</creatorcontrib><creatorcontrib>Li, Yuanchao</creatorcontrib><creatorcontrib>Luo, Jiachen</creatorcontrib><creatorcontrib>Wu, Wen</creatorcontrib><creatorcontrib>Ma, Ziyang</creatorcontrib><creatorcontrib>Bell, Peter</creatorcontrib><creatorcontrib>Lai, Catherine</creatorcontrib><creatorcontrib>Reiss, Joshua</creatorcontrib><creatorcontrib>Wang, Lin</creatorcontrib><creatorcontrib>Woodland, Philip C</creatorcontrib><creatorcontrib>Chen, Xie</creatorcontrib><creatorcontrib>Phan, Huy</creatorcontrib><creatorcontrib>Hain, Thomas</creatorcontrib><title>1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem</title><description>Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for training, but problems remain as it sometimes causes over-fitting for minor classes or under-fitting for major classes. This paper presents the system developed by a multi-site team for the participation in the Odyssey 2024 Emotion Recognition Challenge Track-1. The challenge data has the aforementioned properties and therefore the presented systems aimed to tackle these issues, by introducing focal loss in optimisation when applying class weighted loss. Specifically, the focal loss is further weighted by prior-based class weights. Experimental results show that combining these two approaches brings better overall performance, by sacrificing performance on major classes. The system further employs a majority voting strategy to combine the outputs of an ensemble of 7 models. The models are trained independently, using different acoustic features and loss functions - with the aim to have different properties for different data. Hence these models show different performance preferences on major classes and minor classes. The ensemble system output obtained the best performance in the challenge, ranking top-1 among 68 submissions. It also outperformed all single models in our set. On the Odyssey 2024 Emotion Recognition Challenge Task-1 data the system obtained a Macro-F1 score of 35.69% and an accuracy of 37.32%.</description><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_QILt2Hl0h6IClSq1guyjG_s6jerEKA6I_D0hZTWj0ehIh5AHzmKZK8WeYPzpvmMhmYoFY6m8JZqHiZ4caKQf3n1NnR_o5OnRzCHgTHe9X6d31L4durWXZ3AOhxZpBeHCt0voi-uGlpYOQqD7vgEHw0I8jb5x2N-RGwsu4P1_bkj1sqvKt-hwfN2Xz4cI0kxGyqBKE2GtscIUnDOOIDDnhS2EbXLNJRqT5IYpAVJmgmvFmuUKVmdSJCLZkMcrdrWsP8euh3Gu_2zr1Tb5BZXbUCY</recordid><startdate>20240530</startdate><enddate>20240530</enddate><creator>Chen, Mingjie</creator><creator>Zhang, Hezhao</creator><creator>Li, Yuanchao</creator><creator>Luo, Jiachen</creator><creator>Wu, Wen</creator><creator>Ma, Ziyang</creator><creator>Bell, Peter</creator><creator>Lai, Catherine</creator><creator>Reiss, Joshua</creator><creator>Wang, Lin</creator><creator>Woodland, Philip C</creator><creator>Chen, Xie</creator><creator>Phan, Huy</creator><creator>Hain, Thomas</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240530</creationdate><title>1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem</title><author>Chen, Mingjie ; Zhang, Hezhao ; Li, Yuanchao ; Luo, Jiachen ; Wu, Wen ; Ma, Ziyang ; Bell, Peter ; Lai, Catherine ; Reiss, Joshua ; Wang, Lin ; Woodland, Philip C ; Chen, Xie ; Phan, Huy ; Hain, Thomas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-5de5632ffdf2d91101ea2e819f92fb8c14edd38d052a44721c50bf2dafc742323</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Mingjie</creatorcontrib><creatorcontrib>Zhang, Hezhao</creatorcontrib><creatorcontrib>Li, Yuanchao</creatorcontrib><creatorcontrib>Luo, Jiachen</creatorcontrib><creatorcontrib>Wu, Wen</creatorcontrib><creatorcontrib>Ma, Ziyang</creatorcontrib><creatorcontrib>Bell, Peter</creatorcontrib><creatorcontrib>Lai, Catherine</creatorcontrib><creatorcontrib>Reiss, Joshua</creatorcontrib><creatorcontrib>Wang, Lin</creatorcontrib><creatorcontrib>Woodland, Philip C</creatorcontrib><creatorcontrib>Chen, Xie</creatorcontrib><creatorcontrib>Phan, Huy</creatorcontrib><creatorcontrib>Hain, Thomas</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Mingjie</au><au>Zhang, Hezhao</au><au>Li, Yuanchao</au><au>Luo, Jiachen</au><au>Wu, Wen</au><au>Ma, Ziyang</au><au>Bell, Peter</au><au>Lai, Catherine</au><au>Reiss, Joshua</au><au>Wang, Lin</au><au>Woodland, Philip C</au><au>Chen, Xie</au><au>Phan, Huy</au><au>Hain, Thomas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem</atitle><date>2024-05-30</date><risdate>2024</risdate><abstract>Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for training, but problems remain as it sometimes causes over-fitting for minor classes or under-fitting for major classes. This paper presents the system developed by a multi-site team for the participation in the Odyssey 2024 Emotion Recognition Challenge Track-1. The challenge data has the aforementioned properties and therefore the presented systems aimed to tackle these issues, by introducing focal loss in optimisation when applying class weighted loss. Specifically, the focal loss is further weighted by prior-based class weights. Experimental results show that combining these two approaches brings better overall performance, by sacrificing performance on major classes. The system further employs a majority voting strategy to combine the outputs of an ensemble of 7 models. The models are trained independently, using different acoustic features and loss functions - with the aim to have different properties for different data. Hence these models show different performance preferences on major classes and minor classes. The ensemble system output obtained the best performance in the challenge, ranking top-1 among 68 submissions. It also outperformed all single models in our set. On the Odyssey 2024 Emotion Recognition Challenge Task-1 data the system obtained a Macro-F1 score of 35.69% and an accuracy of 37.32%.</abstract><doi>10.48550/arxiv.2405.20064</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2405.20064
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2405_20064
source	arXiv.org
subjects	Computer Science - Sound
title	1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T00%3A31%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=1st%20Place%20Solution%20to%20Odyssey%20Emotion%20Recognition%20Challenge%20Task1:%20Tackling%20Class%20Imbalance%20Problem&rft.au=Chen,%20Mingjie&rft.date=2024-05-30&rft_id=info:doi/10.48550/arxiv.2405.20064&rft_dat=%3Carxiv_GOX%3E2405_20064%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true