Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition
Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant infor...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-06 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Nfissi, Alaa Bouachir, Wassim Bouguila, Nizar Mishara, Brian |
description | Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework. |
doi_str_mv | 10.48550/arxiv.2406.01624 |
format | Article |
fullrecord | <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2406_01624</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3064737499</sourcerecordid><originalsourceid>FETCH-LOGICAL-a529-bd4731d00fe0f644486f67666e1bafa8d238a95f9df10e3fc427348757148c0d3</originalsourceid><addsrcrecordid>eNotkEtPwzAQhC0kJKrSH8AJS5wT_LbDrVR9SZWQoJw4RE5sF1epHZy0Kv-ePjjtHr7Z2RkAHjDKmeIcPet09IecMCRyhAVhN2BAKMWZYoTcgVHXbRFCREjCOR2Ar89wsL7xYQMX3hgb4EzXfUzdC5we20b7oKvGwvESupjgzOp-nyx8jbHrzxof4Edrbf0Np7vY-xjgu63jJvjzfg9unW46O_qfQ7CeTdeTRbZ6my8n41WmOSmyyjBJsUHIWeQEY0wJJ6QQwuJKO60MoUoX3BXGYWSpqxmRlCnJJWaqRoYOweP17CV52Sa_0-m3PDdQXho4EU9Xok3xZ2-7vtzGfQqnn0qKxMlesqKgf1JeXbA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3064737499</pqid></control><display><type>article</type><title>Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition</title><source>Freely Accessible Journals</source><source>arXiv.org</source><creator>Nfissi, Alaa ; Bouachir, Wassim ; Bouguila, Nizar ; Mishara, Brian</creator><creatorcontrib>Nfissi, Alaa ; Bouachir, Wassim ; Bouguila, Nizar ; Mishara, Brian</creatorcontrib><description>Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2406.01624</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning ; Computer Science - Sound ; Emotion recognition ; Emotions ; Explainable artificial intelligence ; Machine learning ; Speech recognition</subject><ispartof>arXiv.org, 2024-06</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.01624$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1007/s10489-024-05536-5$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Nfissi, Alaa</creatorcontrib><creatorcontrib>Bouachir, Wassim</creatorcontrib><creatorcontrib>Bouguila, Nizar</creatorcontrib><creatorcontrib>Mishara, Brian</creatorcontrib><title>Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition</title><title>arXiv.org</title><description>Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Sound</subject><subject>Emotion recognition</subject><subject>Emotions</subject><subject>Explainable artificial intelligence</subject><subject>Machine learning</subject><subject>Speech recognition</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotkEtPwzAQhC0kJKrSH8AJS5wT_LbDrVR9SZWQoJw4RE5sF1epHZy0Kv-ePjjtHr7Z2RkAHjDKmeIcPet09IecMCRyhAVhN2BAKMWZYoTcgVHXbRFCREjCOR2Ar89wsL7xYQMX3hgb4EzXfUzdC5we20b7oKvGwvESupjgzOp-nyx8jbHrzxof4Edrbf0Np7vY-xjgu63jJvjzfg9unW46O_qfQ7CeTdeTRbZ6my8n41WmOSmyyjBJsUHIWeQEY0wJJ6QQwuJKO60MoUoX3BXGYWSpqxmRlCnJJWaqRoYOweP17CV52Sa_0-m3PDdQXho4EU9Xok3xZ2-7vtzGfQqnn0qKxMlesqKgf1JeXbA</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Nfissi, Alaa</creator><creator>Bouachir, Wassim</creator><creator>Bouguila, Nizar</creator><creator>Mishara, Brian</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240601</creationdate><title>Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition</title><author>Nfissi, Alaa ; Bouachir, Wassim ; Bouguila, Nizar ; Mishara, Brian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a529-bd4731d00fe0f644486f67666e1bafa8d238a95f9df10e3fc427348757148c0d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Sound</topic><topic>Emotion recognition</topic><topic>Emotions</topic><topic>Explainable artificial intelligence</topic><topic>Machine learning</topic><topic>Speech recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Nfissi, Alaa</creatorcontrib><creatorcontrib>Bouachir, Wassim</creatorcontrib><creatorcontrib>Bouguila, Nizar</creatorcontrib><creatorcontrib>Mishara, Brian</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nfissi, Alaa</au><au>Bouachir, Wassim</au><au>Bouguila, Nizar</au><au>Mishara, Brian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition</atitle><jtitle>arXiv.org</jtitle><date>2024-06-01</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2406.01624</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-06 |
issn | 2331-8422 |
language | eng |
recordid | cdi_arxiv_primary_2406_01624 |
source | Freely Accessible Journals; arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning Computer Science - Sound Emotion recognition Emotions Explainable artificial intelligence Machine learning Speech recognition |
title | Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T22%3A59%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Unveiling%20Hidden%20Factors:%20Explainable%20AI%20for%20Feature%20Boosting%20in%20Speech%20Emotion%20Recognition&rft.jtitle=arXiv.org&rft.au=Nfissi,%20Alaa&rft.date=2024-06-01&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2406.01624&rft_dat=%3Cproquest_arxiv%3E3064737499%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3064737499&rft_id=info:pmid/&rfr_iscdi=true |