Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm

One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with mor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2022, Vol.10, p.49265-49284
Hauptverfasser: Abdelhamid, Abdelaziz A., El-Kenawy, El-Sayed M., Alotaibi, Bandar, Amer, Ghada M., Abdelkader, Mahmoud Y., Ibrahim, Abdelhameed, Eid, Marwa Metwally
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 49284
container_issue
container_start_page 49265
container_title IEEE access
container_volume 10
creator Abdelhamid, Abdelaziz A.
El-Kenawy, El-Sayed M.
Alotaibi, Bandar
Amer, Ghada M.
Abdelkader, Mahmoud Y.
Ibrahim, Abdelhameed
Eid, Marwa Metwally
description One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistical analysis of the achieved results is provided to emphasize the stability of the proposed approach.
doi_str_mv 10.1109/ACCESS.2022.3172954
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2663643421</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9770097</ieee_id><doaj_id>oai_doaj_org_article_f49e008f06e94b91b9fe0935ad53e306</doaj_id><sourcerecordid>2663643421</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3234-c057c41bf0dbc136c435dc8638d317e9a8eb3b6ff4cf064096f4abe42f8f99ac3</originalsourceid><addsrcrecordid>eNpNkU9PAyEQxTdGE436Cbxs4tG0AsOyy7Fu6p-kauLqmbDs0NK0pcL2oJ9e2jVGLkxe5v1m4GXZFSVjSom8ndT1tGnGjDA2BloyWfCj7IxRIUdQgDj-V59mlzEuSTpVkoryLFu--XYX-7zZIppFPl373vlN_obGzzfuUH9Et5nn9cvLzax5f87vdMQuT3rTe7PQsXcmvw_a9HqVN6hDorxue7d23_pgn6zmPrh-sb7ITqxeRbz8vc-zj_vpe_04mr0-PNWT2cgAAz4ypCgNp60lXWsoCMOh6EwloOrS61DqCltohbXcWCI4kcJy3SJntrJSagPn2dPA7bxeqm1wax2-lNdOHQQf5kqHtPUKleUS018kDkreStpKi0RCobsCEIhIrOuBtQ3-c4exV0u_C5u0vmJCgODAGU1dMHSZ4GMMaP-mUqL2GakhI7XPSP1mlFxXg8sh4p9DliUhsoQfwZCNCw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2663643421</pqid></control><display><type>article</type><title>Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm</title><source>IEEE Open Access Journals</source><source>Directory of Open Access Journals</source><source>EZB Electronic Journals Library</source><creator>Abdelhamid, Abdelaziz A. ; El-Kenawy, El-Sayed M. ; Alotaibi, Bandar ; Amer, Ghada M. ; Abdelkader, Mahmoud Y. ; Ibrahim, Abdelhameed ; Eid, Marwa Metwally</creator><creatorcontrib>Abdelhamid, Abdelaziz A. ; El-Kenawy, El-Sayed M. ; Alotaibi, Bandar ; Amer, Ghada M. ; Abdelkader, Mahmoud Y. ; Ibrahim, Abdelhameed ; Eid, Marwa Metwally</creatorcontrib><description>One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistical analysis of the achieved results is provided to emphasize the stability of the proposed approach.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2022.3172954</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Artificial neural networks ; Convolutional neural networks ; Datasets ; Deep learning ; Emotion recognition ; Emotions ; Feature extraction ; Fractals ; Fractions ; guided whale optimization algorithm ; Machine learning ; Optimization ; Optimization algorithms ; Regularization ; Searching ; Speech emotions ; Speech recognition ; Stability analysis ; Statistical analysis ; Statistical methods ; stochastic fractal search optimization ; Training</subject><ispartof>IEEE access, 2022, Vol.10, p.49265-49284</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3234-c057c41bf0dbc136c435dc8638d317e9a8eb3b6ff4cf064096f4abe42f8f99ac3</citedby><cites>FETCH-LOGICAL-c3234-c057c41bf0dbc136c435dc8638d317e9a8eb3b6ff4cf064096f4abe42f8f99ac3</cites><orcidid>0000-0001-7080-1979 ; 0000-0002-9221-7658 ; 0000-0001-9956-2027 ; 0000-0002-8352-6731 ; 0000-0002-8557-3566</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9770097$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>315,781,785,865,2103,4025,27638,27928,27929,27930,54938</link.rule.ids></links><search><creatorcontrib>Abdelhamid, Abdelaziz A.</creatorcontrib><creatorcontrib>El-Kenawy, El-Sayed M.</creatorcontrib><creatorcontrib>Alotaibi, Bandar</creatorcontrib><creatorcontrib>Amer, Ghada M.</creatorcontrib><creatorcontrib>Abdelkader, Mahmoud Y.</creatorcontrib><creatorcontrib>Ibrahim, Abdelhameed</creatorcontrib><creatorcontrib>Eid, Marwa Metwally</creatorcontrib><title>Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm</title><title>IEEE access</title><addtitle>Access</addtitle><description>One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistical analysis of the achieved results is provided to emphasize the stability of the proposed approach.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Convolutional neural networks</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Emotion recognition</subject><subject>Emotions</subject><subject>Feature extraction</subject><subject>Fractals</subject><subject>Fractions</subject><subject>guided whale optimization algorithm</subject><subject>Machine learning</subject><subject>Optimization</subject><subject>Optimization algorithms</subject><subject>Regularization</subject><subject>Searching</subject><subject>Speech emotions</subject><subject>Speech recognition</subject><subject>Stability analysis</subject><subject>Statistical analysis</subject><subject>Statistical methods</subject><subject>stochastic fractal search optimization</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkU9PAyEQxTdGE436Cbxs4tG0AsOyy7Fu6p-kauLqmbDs0NK0pcL2oJ9e2jVGLkxe5v1m4GXZFSVjSom8ndT1tGnGjDA2BloyWfCj7IxRIUdQgDj-V59mlzEuSTpVkoryLFu--XYX-7zZIppFPl373vlN_obGzzfuUH9Et5nn9cvLzax5f87vdMQuT3rTe7PQsXcmvw_a9HqVN6hDorxue7d23_pgn6zmPrh-sb7ITqxeRbz8vc-zj_vpe_04mr0-PNWT2cgAAz4ypCgNp60lXWsoCMOh6EwloOrS61DqCltohbXcWCI4kcJy3SJntrJSagPn2dPA7bxeqm1wax2-lNdOHQQf5kqHtPUKleUS018kDkreStpKi0RCobsCEIhIrOuBtQ3-c4exV0u_C5u0vmJCgODAGU1dMHSZ4GMMaP-mUqL2GakhI7XPSP1mlFxXg8sh4p9DliUhsoQfwZCNCw</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Abdelhamid, Abdelaziz A.</creator><creator>El-Kenawy, El-Sayed M.</creator><creator>Alotaibi, Bandar</creator><creator>Amer, Ghada M.</creator><creator>Abdelkader, Mahmoud Y.</creator><creator>Ibrahim, Abdelhameed</creator><creator>Eid, Marwa Metwally</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7080-1979</orcidid><orcidid>https://orcid.org/0000-0002-9221-7658</orcidid><orcidid>https://orcid.org/0000-0001-9956-2027</orcidid><orcidid>https://orcid.org/0000-0002-8352-6731</orcidid><orcidid>https://orcid.org/0000-0002-8557-3566</orcidid></search><sort><creationdate>2022</creationdate><title>Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm</title><author>Abdelhamid, Abdelaziz A. ; El-Kenawy, El-Sayed M. ; Alotaibi, Bandar ; Amer, Ghada M. ; Abdelkader, Mahmoud Y. ; Ibrahim, Abdelhameed ; Eid, Marwa Metwally</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3234-c057c41bf0dbc136c435dc8638d317e9a8eb3b6ff4cf064096f4abe42f8f99ac3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Convolutional neural networks</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Emotion recognition</topic><topic>Emotions</topic><topic>Feature extraction</topic><topic>Fractals</topic><topic>Fractions</topic><topic>guided whale optimization algorithm</topic><topic>Machine learning</topic><topic>Optimization</topic><topic>Optimization algorithms</topic><topic>Regularization</topic><topic>Searching</topic><topic>Speech emotions</topic><topic>Speech recognition</topic><topic>Stability analysis</topic><topic>Statistical analysis</topic><topic>Statistical methods</topic><topic>stochastic fractal search optimization</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Abdelhamid, Abdelaziz A.</creatorcontrib><creatorcontrib>El-Kenawy, El-Sayed M.</creatorcontrib><creatorcontrib>Alotaibi, Bandar</creatorcontrib><creatorcontrib>Amer, Ghada M.</creatorcontrib><creatorcontrib>Abdelkader, Mahmoud Y.</creatorcontrib><creatorcontrib>Ibrahim, Abdelhameed</creatorcontrib><creatorcontrib>Eid, Marwa Metwally</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Abdelhamid, Abdelaziz A.</au><au>El-Kenawy, El-Sayed M.</au><au>Alotaibi, Bandar</au><au>Amer, Ghada M.</au><au>Abdelkader, Mahmoud Y.</au><au>Ibrahim, Abdelhameed</au><au>Eid, Marwa Metwally</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2022</date><risdate>2022</risdate><volume>10</volume><spage>49265</spage><epage>49284</epage><pages>49265-49284</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistical analysis of the achieved results is provided to emphasize the stability of the proposed approach.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2022.3172954</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-7080-1979</orcidid><orcidid>https://orcid.org/0000-0002-9221-7658</orcidid><orcidid>https://orcid.org/0000-0001-9956-2027</orcidid><orcidid>https://orcid.org/0000-0002-8352-6731</orcidid><orcidid>https://orcid.org/0000-0002-8557-3566</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2022, Vol.10, p.49265-49284
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2663643421
source IEEE Open Access Journals; Directory of Open Access Journals; EZB Electronic Journals Library
subjects Algorithms
Artificial neural networks
Convolutional neural networks
Datasets
Deep learning
Emotion recognition
Emotions
Feature extraction
Fractals
Fractions
guided whale optimization algorithm
Machine learning
Optimization
Optimization algorithms
Regularization
Searching
Speech emotions
Speech recognition
Stability analysis
Statistical analysis
Statistical methods
stochastic fractal search optimization
Training
title Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-13T18%3A06%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Speech%20Emotion%20Recognition%20Using%20CNN+LSTM%20Based%20on%20Stochastic%20Fractal%20Search%20Optimization%20Algorithm&rft.jtitle=IEEE%20access&rft.au=Abdelhamid,%20Abdelaziz%20A.&rft.date=2022&rft.volume=10&rft.spage=49265&rft.epage=49284&rft.pages=49265-49284&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2022.3172954&rft_dat=%3Cproquest_ieee_%3E2663643421%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2663643421&rft_id=info:pmid/&rft_ieee_id=9770097&rft_doaj_id=oai_doaj_org_article_f49e008f06e94b91b9fe0935ad53e306&rfr_iscdi=true