Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm

One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with mor...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2022, Vol.10, p.49265-49284
Hauptverfasser:	Abdelhamid, Abdelaziz A., El-Kenawy, El-Sayed M., Alotaibi, Bandar, Amer, Ghada M., Abdelkader, Mahmoud Y., Ibrahim, Abdelhameed, Eid, Marwa Metwally
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial neural networks Convolutional neural networks Datasets Deep learning Emotion recognition Emotions Feature extraction Fractals Fractions guided whale optimization algorithm Machine learning Optimization Optimization algorithms Regularization Searching Speech emotions Speech recognition Stability analysis Statistical analysis Statistical methods stochastic fractal search optimization Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	49284
container_issue
container_start_page	49265
container_title	IEEE access
container_volume	10
creator	Abdelhamid, Abdelaziz A. El-Kenawy, El-Sayed M. Alotaibi, Bandar Amer, Ghada M. Abdelkader, Mahmoud Y. Ibrahim, Abdelhameed Eid, Marwa Metwally
description	One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistical analysis of the achieved results is provided to emphasize the stability of the proposed approach.
doi_str_mv	10.1109/ACCESS.2022.3172954
format	Article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2663643421</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9770097</ieee_id><doaj_id>oai_doaj_org_article_f49e008f06e94b91b9fe0935ad53e306</doaj_id><sourcerecordid>2663643421</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3234-c057c41bf0dbc136c435dc8638d317e9a8eb3b6ff4cf064096f4abe42f8f99ac3</originalsourceid><addsrcrecordid>eNpNkU9PAyEQxTdGE436Cbxs4tG0AsOyy7Fu6p-kauLqmbDs0NK0pcL2oJ9e2jVGLkxe5v1m4GXZFSVjSom8ndT1tGnGjDA2BloyWfCj7IxRIUdQgDj-V59mlzEuSTpVkoryLFu--XYX-7zZIppFPl373vlN_obGzzfuUH9Et5nn9cvLzax5f87vdMQuT3rTe7PQsXcmvw_a9HqVN6hDorxue7d23_pgn6zmPrh-sb7ITqxeRbz8vc-zj_vpe_04mr0-PNWT2cgAAz4ypCgNp60lXWsoCMOh6EwloOrS61DqCltohbXcWCI4kcJy3SJntrJSagPn2dPA7bxeqm1wax2-lNdOHQQf5kqHtPUKleUS018kDkreStpKi0RCobsCEIhIrOuBtQ3-c4exV0u_C5u0vmJCgODAGU1dMHSZ4GMMaP-mUqL2GakhI7XPSP1mlFxXg8sh4p9DliUhsoQfwZCNCw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2663643421</pqid></control><display><type>article</type><title>Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Abdelhamid, Abdelaziz A. ; El-Kenawy, El-Sayed M. ; Alotaibi, Bandar ; Amer, Ghada M. ; Abdelkader, Mahmoud Y. ; Ibrahim, Abdelhameed ; Eid, Marwa Metwally</creator><creatorcontrib>Abdelhamid, Abdelaziz A. ; El-Kenawy, El-Sayed M. ; Alotaibi, Bandar ; Amer, Ghada M. ; Abdelkader, Mahmoud Y. ; Ibrahim, Abdelhameed ; Eid, Marwa Metwally</creatorcontrib><description>One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistical analysis of the achieved results is provided to emphasize the stability of the proposed approach.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2022.3172954</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Artificial neural networks ; Convolutional neural networks ; Datasets ; Deep learning ; Emotion recognition ; Emotions ; Feature extraction ; Fractals ; Fractions ; guided whale optimization algorithm ; Machine learning ; Optimization ; Optimization algorithms ; Regularization ; Searching ; Speech emotions ; Speech recognition ; Stability analysis ; Statistical analysis ; Statistical methods ; stochastic fractal search optimization ; Training</subject><ispartof>IEEE access, 2022, Vol.10, p.49265-49284</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3234-c057c41bf0dbc136c435dc8638d317e9a8eb3b6ff4cf064096f4abe42f8f99ac3</citedby><cites>FETCH-LOGICAL-c3234-c057c41bf0dbc136c435dc8638d317e9a8eb3b6ff4cf064096f4abe42f8f99ac3</cites><orcidid>0000-0001-7080-1979 ; 0000-0002-9221-7658 ; 0000-0001-9956-2027 ; 0000-0002-8352-6731 ; 0000-0002-8557-3566</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9770097$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,777,781,861,2096,4010,27614,27904,27905,27906,54914</link.rule.ids></links><search><creatorcontrib>Abdelhamid, Abdelaziz A.</creatorcontrib><creatorcontrib>El-Kenawy, El-Sayed M.</creatorcontrib><creatorcontrib>Alotaibi, Bandar</creatorcontrib><creatorcontrib>Amer, Ghada M.</creatorcontrib><creatorcontrib>Abdelkader, Mahmoud Y.</creatorcontrib><creatorcontrib>Ibrahim, Abdelhameed</creatorcontrib><creatorcontrib>Eid, Marwa Metwally</creatorcontrib><title>Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm</title><title>IEEE access</title><addtitle>Access</addtitle><description>One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistical analysis of the achieved results is provided to emphasize the stability of the proposed approach.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Convolutional neural networks</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Emotion recognition</subject><subject>Emotions</subject><subject>Feature extraction</subject><subject>Fractals</subject><subject>Fractions</subject><subject>guided whale optimization algorithm</subject><subject>Machine learning</subject><subject>Optimization</subject><subject>Optimization algorithms</subject><subject>Regularization</subject><subject>Searching</subject><subject>Speech emotions</subject><subject>Speech recognition</subject><subject>Stability analysis</subject><subject>Statistical analysis</subject><subject>Statistical methods</subject><subject>stochastic fractal search optimization</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkU9PAyEQxTdGE436Cbxs4tG0AsOyy7Fu6p-kauLqmbDs0NK0pcL2oJ9e2jVGLkxe5v1m4GXZFSVjSom8ndT1tGnGjDA2BloyWfCj7IxRIUdQgDj-V59mlzEuSTpVkoryLFu--XYX-7zZIppFPl373vlN_obGzzfuUH9Et5nn9cvLzax5f87vdMQuT3rTe7PQsXcmvw_a9HqVN6hDorxue7d23_pgn6zmPrh-sb7ITqxeRbz8vc-zj_vpe_04mr0-PNWT2cgAAz4ypCgNp60lXWsoCMOh6EwloOrS61DqCltohbXcWCI4kcJy3SJntrJSagPn2dPA7bxeqm1wax2-lNdOHQQf5kqHtPUKleUS018kDkreStpKi0RCobsCEIhIrOuBtQ3-c4exV0u_C5u0vmJCgODAGU1dMHSZ4GMMaP-mUqL2GakhI7XPSP1mlFxXg8sh4p9DliUhsoQfwZCNCw</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Abdelhamid, Abdelaziz A.</creator><creator>El-Kenawy, El-Sayed M.</creator><creator>Alotaibi, Bandar</creator><creator>Amer, Ghada M.</creator><creator>Abdelkader, Mahmoud Y.</creator><creator>Ibrahim, Abdelhameed</creator><creator>Eid, Marwa Metwally</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7080-1979</orcidid><orcidid>https://orcid.org/0000-0002-9221-7658</orcidid><orcidid>https://orcid.org/0000-0001-9956-2027</orcidid><orcidid>https://orcid.org/0000-0002-8352-6731</orcidid><orcidid>https://orcid.org/0000-0002-8557-3566</orcidid></search><sort><creationdate>2022</creationdate><title>Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm</title><author>Abdelhamid, Abdelaziz A. ; El-Kenawy, El-Sayed M. ; Alotaibi, Bandar ; Amer, Ghada M. ; Abdelkader, Mahmoud Y. ; Ibrahim, Abdelhameed ; Eid, Marwa Metwally</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3234-c057c41bf0dbc136c435dc8638d317e9a8eb3b6ff4cf064096f4abe42f8f99ac3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Convolutional neural networks</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Emotion recognition</topic><topic>Emotions</topic><topic>Feature extraction</topic><topic>Fractals</topic><topic>Fractions</topic><topic>guided whale optimization algorithm</topic><topic>Machine learning</topic><topic>Optimization</topic><topic>Optimization algorithms</topic><topic>Regularization</topic><topic>Searching</topic><topic>Speech emotions</topic><topic>Speech recognition</topic><topic>Stability analysis</topic><topic>Statistical analysis</topic><topic>Statistical methods</topic><topic>stochastic fractal search optimization</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Abdelhamid, Abdelaziz A.</creatorcontrib><creatorcontrib>El-Kenawy, El-Sayed M.</creatorcontrib><creatorcontrib>Alotaibi, Bandar</creatorcontrib><creatorcontrib>Amer, Ghada M.</creatorcontrib><creatorcontrib>Abdelkader, Mahmoud Y.</creatorcontrib><creatorcontrib>Ibrahim, Abdelhameed</creatorcontrib><creatorcontrib>Eid, Marwa Metwally</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Abdelhamid, Abdelaziz A.</au><au>El-Kenawy, El-Sayed M.</au><au>Alotaibi, Bandar</au><au>Amer, Ghada M.</au><au>Abdelkader, Mahmoud Y.</au><au>Ibrahim, Abdelhameed</au><au>Eid, Marwa Metwally</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2022</date><risdate>2022</risdate><volume>10</volume><spage>49265</spage><epage>49284</epage><pages>49265-49284</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistical analysis of the achieved results is provided to emphasize the stability of the proposed approach.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2022.3172954</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-7080-1979</orcidid><orcidid>https://orcid.org/0000-0002-9221-7658</orcidid><orcidid>https://orcid.org/0000-0001-9956-2027</orcidid><orcidid>https://orcid.org/0000-0002-8352-6731</orcidid><orcidid>https://orcid.org/0000-0002-8557-3566</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2022, Vol.10, p.49265-49284
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_2663643421
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects	Algorithms Artificial neural networks Convolutional neural networks Datasets Deep learning Emotion recognition Emotions Feature extraction Fractals Fractions guided whale optimization algorithm Machine learning Optimization Optimization algorithms Regularization Searching Speech emotions Speech recognition Stability analysis Statistical analysis Statistical methods stochastic fractal search optimization Training
title	Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T01%3A55%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Speech%20Emotion%20Recognition%20Using%20CNN+LSTM%20Based%20on%20Stochastic%20Fractal%20Search%20Optimization%20Algorithm&rft.jtitle=IEEE%20access&rft.au=Abdelhamid,%20Abdelaziz%20A.&rft.date=2022&rft.volume=10&rft.spage=49265&rft.epage=49284&rft.pages=49265-49284&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2022.3172954&rft_dat=%3Cproquest_ieee_%3E2663643421%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2663643421&rft_id=info:pmid/&rft_ieee_id=9770097&rft_doaj_id=oai_doaj_org_article_f49e008f06e94b91b9fe0935ad53e306&rfr_iscdi=true