Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition

Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant infor...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-06
Hauptverfasser:	Nfissi, Alaa, Bouachir, Wassim, Bouguila, Nizar, Mishara, Brian
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning Computer Science - Sound Emotion recognition Emotions Explainable artificial intelligence Machine learning Speech recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Nfissi, Alaa Bouachir, Wassim Bouguila, Nizar Mishara, Brian
description	Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework.
doi_str_mv	10.48550/arxiv.2406.01624
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2406_01624</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3064737499</sourcerecordid><originalsourceid>FETCH-LOGICAL-a529-bd4731d00fe0f644486f67666e1bafa8d238a95f9df10e3fc427348757148c0d3</originalsourceid><addsrcrecordid>eNotkEtPwzAQhC0kJKrSH8AJS5wT_LbDrVR9SZWQoJw4RE5sF1epHZy0Kv-ePjjtHr7Z2RkAHjDKmeIcPet09IecMCRyhAVhN2BAKMWZYoTcgVHXbRFCREjCOR2Ar89wsL7xYQMX3hgb4EzXfUzdC5we20b7oKvGwvESupjgzOp-nyx8jbHrzxof4Edrbf0Np7vY-xjgu63jJvjzfg9unW46O_qfQ7CeTdeTRbZ6my8n41WmOSmyyjBJsUHIWeQEY0wJJ6QQwuJKO60MoUoX3BXGYWSpqxmRlCnJJWaqRoYOweP17CV52Sa_0-m3PDdQXho4EU9Xok3xZ2-7vtzGfQqnn0qKxMlesqKgf1JeXbA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3064737499</pqid></control><display><type>article</type><title>Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition</title><source>Freely Accessible Journals</source><source>arXiv.org</source><creator>Nfissi, Alaa ; Bouachir, Wassim ; Bouguila, Nizar ; Mishara, Brian</creator><creatorcontrib>Nfissi, Alaa ; Bouachir, Wassim ; Bouguila, Nizar ; Mishara, Brian</creatorcontrib><description>Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2406.01624</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning ; Computer Science - Sound ; Emotion recognition ; Emotions ; Explainable artificial intelligence ; Machine learning ; Speech recognition</subject><ispartof>arXiv.org, 2024-06</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.01624$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1007/s10489-024-05536-5$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Nfissi, Alaa</creatorcontrib><creatorcontrib>Bouachir, Wassim</creatorcontrib><creatorcontrib>Bouguila, Nizar</creatorcontrib><creatorcontrib>Mishara, Brian</creatorcontrib><title>Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition</title><title>arXiv.org</title><description>Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Sound</subject><subject>Emotion recognition</subject><subject>Emotions</subject><subject>Explainable artificial intelligence</subject><subject>Machine learning</subject><subject>Speech recognition</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotkEtPwzAQhC0kJKrSH8AJS5wT_LbDrVR9SZWQoJw4RE5sF1epHZy0Kv-ePjjtHr7Z2RkAHjDKmeIcPet09IecMCRyhAVhN2BAKMWZYoTcgVHXbRFCREjCOR2Ar89wsL7xYQMX3hgb4EzXfUzdC5we20b7oKvGwvESupjgzOp-nyx8jbHrzxof4Edrbf0Np7vY-xjgu63jJvjzfg9unW46O_qfQ7CeTdeTRbZ6my8n41WmOSmyyjBJsUHIWeQEY0wJJ6QQwuJKO60MoUoX3BXGYWSpqxmRlCnJJWaqRoYOweP17CV52Sa_0-m3PDdQXho4EU9Xok3xZ2-7vtzGfQqnn0qKxMlesqKgf1JeXbA</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Nfissi, Alaa</creator><creator>Bouachir, Wassim</creator><creator>Bouguila, Nizar</creator><creator>Mishara, Brian</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240601</creationdate><title>Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition</title><author>Nfissi, Alaa ; Bouachir, Wassim ; Bouguila, Nizar ; Mishara, Brian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a529-bd4731d00fe0f644486f67666e1bafa8d238a95f9df10e3fc427348757148c0d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Sound</topic><topic>Emotion recognition</topic><topic>Emotions</topic><topic>Explainable artificial intelligence</topic><topic>Machine learning</topic><topic>Speech recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Nfissi, Alaa</creatorcontrib><creatorcontrib>Bouachir, Wassim</creatorcontrib><creatorcontrib>Bouguila, Nizar</creatorcontrib><creatorcontrib>Mishara, Brian</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nfissi, Alaa</au><au>Bouachir, Wassim</au><au>Bouguila, Nizar</au><au>Mishara, Brian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition</atitle><jtitle>arXiv.org</jtitle><date>2024-06-01</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2406.01624</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-06
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2406_01624
source	Freely Accessible Journals; arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning Computer Science - Sound Emotion recognition Emotions Explainable artificial intelligence Machine learning Speech recognition
title	Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T22%3A59%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Unveiling%20Hidden%20Factors:%20Explainable%20AI%20for%20Feature%20Boosting%20in%20Speech%20Emotion%20Recognition&rft.jtitle=arXiv.org&rft.au=Nfissi,%20Alaa&rft.date=2024-06-01&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2406.01624&rft_dat=%3Cproquest_arxiv%3E3064737499%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3064737499&rft_id=info:pmid/&rfr_iscdi=true