Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition

Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant infor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-06
Hauptverfasser: Nfissi, Alaa, Bouachir, Wassim, Bouguila, Nizar, Mishara, Brian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Nfissi, Alaa
Bouachir, Wassim
Bouguila, Nizar
Mishara, Brian
description Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework.
doi_str_mv 10.48550/arxiv.2406.01624
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2406_01624</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3064737499</sourcerecordid><originalsourceid>FETCH-LOGICAL-a529-bd4731d00fe0f644486f67666e1bafa8d238a95f9df10e3fc427348757148c0d3</originalsourceid><addsrcrecordid>eNotkEtPwzAQhC0kJKrSH8AJS5wT_LbDrVR9SZWQoJw4RE5sF1epHZy0Kv-ePjjtHr7Z2RkAHjDKmeIcPet09IecMCRyhAVhN2BAKMWZYoTcgVHXbRFCREjCOR2Ar89wsL7xYQMX3hgb4EzXfUzdC5we20b7oKvGwvESupjgzOp-nyx8jbHrzxof4Edrbf0Np7vY-xjgu63jJvjzfg9unW46O_qfQ7CeTdeTRbZ6my8n41WmOSmyyjBJsUHIWeQEY0wJJ6QQwuJKO60MoUoX3BXGYWSpqxmRlCnJJWaqRoYOweP17CV52Sa_0-m3PDdQXho4EU9Xok3xZ2-7vtzGfQqnn0qKxMlesqKgf1JeXbA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3064737499</pqid></control><display><type>article</type><title>Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition</title><source>Freely Accessible Journals</source><source>arXiv.org</source><creator>Nfissi, Alaa ; Bouachir, Wassim ; Bouguila, Nizar ; Mishara, Brian</creator><creatorcontrib>Nfissi, Alaa ; Bouachir, Wassim ; Bouguila, Nizar ; Mishara, Brian</creatorcontrib><description>Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2406.01624</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning ; Computer Science - Sound ; Emotion recognition ; Emotions ; Explainable artificial intelligence ; Machine learning ; Speech recognition</subject><ispartof>arXiv.org, 2024-06</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.01624$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1007/s10489-024-05536-5$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Nfissi, Alaa</creatorcontrib><creatorcontrib>Bouachir, Wassim</creatorcontrib><creatorcontrib>Bouguila, Nizar</creatorcontrib><creatorcontrib>Mishara, Brian</creatorcontrib><title>Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition</title><title>arXiv.org</title><description>Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Sound</subject><subject>Emotion recognition</subject><subject>Emotions</subject><subject>Explainable artificial intelligence</subject><subject>Machine learning</subject><subject>Speech recognition</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotkEtPwzAQhC0kJKrSH8AJS5wT_LbDrVR9SZWQoJw4RE5sF1epHZy0Kv-ePjjtHr7Z2RkAHjDKmeIcPet09IecMCRyhAVhN2BAKMWZYoTcgVHXbRFCREjCOR2Ar89wsL7xYQMX3hgb4EzXfUzdC5we20b7oKvGwvESupjgzOp-nyx8jbHrzxof4Edrbf0Np7vY-xjgu63jJvjzfg9unW46O_qfQ7CeTdeTRbZ6my8n41WmOSmyyjBJsUHIWeQEY0wJJ6QQwuJKO60MoUoX3BXGYWSpqxmRlCnJJWaqRoYOweP17CV52Sa_0-m3PDdQXho4EU9Xok3xZ2-7vtzGfQqnn0qKxMlesqKgf1JeXbA</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Nfissi, Alaa</creator><creator>Bouachir, Wassim</creator><creator>Bouguila, Nizar</creator><creator>Mishara, Brian</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240601</creationdate><title>Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition</title><author>Nfissi, Alaa ; Bouachir, Wassim ; Bouguila, Nizar ; Mishara, Brian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a529-bd4731d00fe0f644486f67666e1bafa8d238a95f9df10e3fc427348757148c0d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Sound</topic><topic>Emotion recognition</topic><topic>Emotions</topic><topic>Explainable artificial intelligence</topic><topic>Machine learning</topic><topic>Speech recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Nfissi, Alaa</creatorcontrib><creatorcontrib>Bouachir, Wassim</creatorcontrib><creatorcontrib>Bouguila, Nizar</creatorcontrib><creatorcontrib>Mishara, Brian</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nfissi, Alaa</au><au>Bouachir, Wassim</au><au>Bouguila, Nizar</au><au>Mishara, Brian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition</atitle><jtitle>arXiv.org</jtitle><date>2024-06-01</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2406.01624</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-06
issn 2331-8422
language eng
recordid cdi_arxiv_primary_2406_01624
source Freely Accessible Journals; arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Learning
Computer Science - Sound
Emotion recognition
Emotions
Explainable artificial intelligence
Machine learning
Speech recognition
title Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T22%3A59%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Unveiling%20Hidden%20Factors:%20Explainable%20AI%20for%20Feature%20Boosting%20in%20Speech%20Emotion%20Recognition&rft.jtitle=arXiv.org&rft.au=Nfissi,%20Alaa&rft.date=2024-06-01&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2406.01624&rft_dat=%3Cproquest_arxiv%3E3064737499%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3064737499&rft_id=info:pmid/&rfr_iscdi=true