Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm
One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with mor...
Gespeichert in:
Veröffentlicht in: | IEEE access 2022, Vol.10, p.49265-49284 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 49284 |
---|---|
container_issue | |
container_start_page | 49265 |
container_title | IEEE access |
container_volume | 10 |
creator | Abdelhamid, Abdelaziz A. El-Kenawy, El-Sayed M. Alotaibi, Bandar Amer, Ghada M. Abdelkader, Mahmoud Y. Ibrahim, Abdelhameed Eid, Marwa Metwally |
description | One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistical analysis of the achieved results is provided to emphasize the stability of the proposed approach. |
doi_str_mv | 10.1109/ACCESS.2022.3172954 |
format | Article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2663643421</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9770097</ieee_id><doaj_id>oai_doaj_org_article_f49e008f06e94b91b9fe0935ad53e306</doaj_id><sourcerecordid>2663643421</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3234-c057c41bf0dbc136c435dc8638d317e9a8eb3b6ff4cf064096f4abe42f8f99ac3</originalsourceid><addsrcrecordid>eNpNkU9PAyEQxTdGE436Cbxs4tG0AsOyy7Fu6p-kauLqmbDs0NK0pcL2oJ9e2jVGLkxe5v1m4GXZFSVjSom8ndT1tGnGjDA2BloyWfCj7IxRIUdQgDj-V59mlzEuSTpVkoryLFu--XYX-7zZIppFPl373vlN_obGzzfuUH9Et5nn9cvLzax5f87vdMQuT3rTe7PQsXcmvw_a9HqVN6hDorxue7d23_pgn6zmPrh-sb7ITqxeRbz8vc-zj_vpe_04mr0-PNWT2cgAAz4ypCgNp60lXWsoCMOh6EwloOrS61DqCltohbXcWCI4kcJy3SJntrJSagPn2dPA7bxeqm1wax2-lNdOHQQf5kqHtPUKleUS018kDkreStpKi0RCobsCEIhIrOuBtQ3-c4exV0u_C5u0vmJCgODAGU1dMHSZ4GMMaP-mUqL2GakhI7XPSP1mlFxXg8sh4p9DliUhsoQfwZCNCw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2663643421</pqid></control><display><type>article</type><title>Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm</title><source>IEEE Open Access Journals</source><source>Directory of Open Access Journals</source><source>EZB Electronic Journals Library</source><creator>Abdelhamid, Abdelaziz A. ; El-Kenawy, El-Sayed M. ; Alotaibi, Bandar ; Amer, Ghada M. ; Abdelkader, Mahmoud Y. ; Ibrahim, Abdelhameed ; Eid, Marwa Metwally</creator><creatorcontrib>Abdelhamid, Abdelaziz A. ; El-Kenawy, El-Sayed M. ; Alotaibi, Bandar ; Amer, Ghada M. ; Abdelkader, Mahmoud Y. ; Ibrahim, Abdelhameed ; Eid, Marwa Metwally</creatorcontrib><description>One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistical analysis of the achieved results is provided to emphasize the stability of the proposed approach.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2022.3172954</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Artificial neural networks ; Convolutional neural networks ; Datasets ; Deep learning ; Emotion recognition ; Emotions ; Feature extraction ; Fractals ; Fractions ; guided whale optimization algorithm ; Machine learning ; Optimization ; Optimization algorithms ; Regularization ; Searching ; Speech emotions ; Speech recognition ; Stability analysis ; Statistical analysis ; Statistical methods ; stochastic fractal search optimization ; Training</subject><ispartof>IEEE access, 2022, Vol.10, p.49265-49284</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3234-c057c41bf0dbc136c435dc8638d317e9a8eb3b6ff4cf064096f4abe42f8f99ac3</citedby><cites>FETCH-LOGICAL-c3234-c057c41bf0dbc136c435dc8638d317e9a8eb3b6ff4cf064096f4abe42f8f99ac3</cites><orcidid>0000-0001-7080-1979 ; 0000-0002-9221-7658 ; 0000-0001-9956-2027 ; 0000-0002-8352-6731 ; 0000-0002-8557-3566</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9770097$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>315,781,785,865,2103,4025,27638,27928,27929,27930,54938</link.rule.ids></links><search><creatorcontrib>Abdelhamid, Abdelaziz A.</creatorcontrib><creatorcontrib>El-Kenawy, El-Sayed M.</creatorcontrib><creatorcontrib>Alotaibi, Bandar</creatorcontrib><creatorcontrib>Amer, Ghada M.</creatorcontrib><creatorcontrib>Abdelkader, Mahmoud Y.</creatorcontrib><creatorcontrib>Ibrahim, Abdelhameed</creatorcontrib><creatorcontrib>Eid, Marwa Metwally</creatorcontrib><title>Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm</title><title>IEEE access</title><addtitle>Access</addtitle><description>One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistical analysis of the achieved results is provided to emphasize the stability of the proposed approach.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Convolutional neural networks</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Emotion recognition</subject><subject>Emotions</subject><subject>Feature extraction</subject><subject>Fractals</subject><subject>Fractions</subject><subject>guided whale optimization algorithm</subject><subject>Machine learning</subject><subject>Optimization</subject><subject>Optimization algorithms</subject><subject>Regularization</subject><subject>Searching</subject><subject>Speech emotions</subject><subject>Speech recognition</subject><subject>Stability analysis</subject><subject>Statistical analysis</subject><subject>Statistical methods</subject><subject>stochastic fractal search optimization</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkU9PAyEQxTdGE436Cbxs4tG0AsOyy7Fu6p-kauLqmbDs0NK0pcL2oJ9e2jVGLkxe5v1m4GXZFSVjSom8ndT1tGnGjDA2BloyWfCj7IxRIUdQgDj-V59mlzEuSTpVkoryLFu--XYX-7zZIppFPl373vlN_obGzzfuUH9Et5nn9cvLzax5f87vdMQuT3rTe7PQsXcmvw_a9HqVN6hDorxue7d23_pgn6zmPrh-sb7ITqxeRbz8vc-zj_vpe_04mr0-PNWT2cgAAz4ypCgNp60lXWsoCMOh6EwloOrS61DqCltohbXcWCI4kcJy3SJntrJSagPn2dPA7bxeqm1wax2-lNdOHQQf5kqHtPUKleUS018kDkreStpKi0RCobsCEIhIrOuBtQ3-c4exV0u_C5u0vmJCgODAGU1dMHSZ4GMMaP-mUqL2GakhI7XPSP1mlFxXg8sh4p9DliUhsoQfwZCNCw</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Abdelhamid, Abdelaziz A.</creator><creator>El-Kenawy, El-Sayed M.</creator><creator>Alotaibi, Bandar</creator><creator>Amer, Ghada M.</creator><creator>Abdelkader, Mahmoud Y.</creator><creator>Ibrahim, Abdelhameed</creator><creator>Eid, Marwa Metwally</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7080-1979</orcidid><orcidid>https://orcid.org/0000-0002-9221-7658</orcidid><orcidid>https://orcid.org/0000-0001-9956-2027</orcidid><orcidid>https://orcid.org/0000-0002-8352-6731</orcidid><orcidid>https://orcid.org/0000-0002-8557-3566</orcidid></search><sort><creationdate>2022</creationdate><title>Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm</title><author>Abdelhamid, Abdelaziz A. ; El-Kenawy, El-Sayed M. ; Alotaibi, Bandar ; Amer, Ghada M. ; Abdelkader, Mahmoud Y. ; Ibrahim, Abdelhameed ; Eid, Marwa Metwally</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3234-c057c41bf0dbc136c435dc8638d317e9a8eb3b6ff4cf064096f4abe42f8f99ac3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Convolutional neural networks</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Emotion recognition</topic><topic>Emotions</topic><topic>Feature extraction</topic><topic>Fractals</topic><topic>Fractions</topic><topic>guided whale optimization algorithm</topic><topic>Machine learning</topic><topic>Optimization</topic><topic>Optimization algorithms</topic><topic>Regularization</topic><topic>Searching</topic><topic>Speech emotions</topic><topic>Speech recognition</topic><topic>Stability analysis</topic><topic>Statistical analysis</topic><topic>Statistical methods</topic><topic>stochastic fractal search optimization</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Abdelhamid, Abdelaziz A.</creatorcontrib><creatorcontrib>El-Kenawy, El-Sayed M.</creatorcontrib><creatorcontrib>Alotaibi, Bandar</creatorcontrib><creatorcontrib>Amer, Ghada M.</creatorcontrib><creatorcontrib>Abdelkader, Mahmoud Y.</creatorcontrib><creatorcontrib>Ibrahim, Abdelhameed</creatorcontrib><creatorcontrib>Eid, Marwa Metwally</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Abdelhamid, Abdelaziz A.</au><au>El-Kenawy, El-Sayed M.</au><au>Alotaibi, Bandar</au><au>Amer, Ghada M.</au><au>Abdelkader, Mahmoud Y.</au><au>Ibrahim, Abdelhameed</au><au>Eid, Marwa Metwally</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2022</date><risdate>2022</risdate><volume>10</volume><spage>49265</spage><epage>49284</epage><pages>49265-49284</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistical analysis of the achieved results is provided to emphasize the stability of the proposed approach.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2022.3172954</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-7080-1979</orcidid><orcidid>https://orcid.org/0000-0002-9221-7658</orcidid><orcidid>https://orcid.org/0000-0001-9956-2027</orcidid><orcidid>https://orcid.org/0000-0002-8352-6731</orcidid><orcidid>https://orcid.org/0000-0002-8557-3566</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2022, Vol.10, p.49265-49284 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_proquest_journals_2663643421 |
source | IEEE Open Access Journals; Directory of Open Access Journals; EZB Electronic Journals Library |
subjects | Algorithms Artificial neural networks Convolutional neural networks Datasets Deep learning Emotion recognition Emotions Feature extraction Fractals Fractions guided whale optimization algorithm Machine learning Optimization Optimization algorithms Regularization Searching Speech emotions Speech recognition Stability analysis Statistical analysis Statistical methods stochastic fractal search optimization Training |
title | Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-13T18%3A06%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Speech%20Emotion%20Recognition%20Using%20CNN+LSTM%20Based%20on%20Stochastic%20Fractal%20Search%20Optimization%20Algorithm&rft.jtitle=IEEE%20access&rft.au=Abdelhamid,%20Abdelaziz%20A.&rft.date=2022&rft.volume=10&rft.spage=49265&rft.epage=49284&rft.pages=49265-49284&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2022.3172954&rft_dat=%3Cproquest_ieee_%3E2663643421%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2663643421&rft_id=info:pmid/&rft_ieee_id=9770097&rft_doaj_id=oai_doaj_org_article_f49e008f06e94b91b9fe0935ad53e306&rfr_iscdi=true |