Speech Emotion Recognition by Late Fusion for Bidirectional Reservoir Computing With Random Projection

Many researchers are inspired by studying Speech Emotion Recognition (SER) because it is considered as a key effort in Human-Computer Interaction (HCI). The main focus of this work is to design a model for emotion recognition from speech, which has plenty of challenges within it. Due to the time ser...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2021, Vol.9, p.122855-122871
Hauptverfasser:	Ibrahim, Hemin, Loo, Chu Kiong, Alnajjar, Fady
Format:	Artikel
Sprache:	eng
Schlagworte:	Computation Emotion recognition Emotions Feature extraction Human computer interaction Human-computer interface Principal component analysis random projection recurrent neural network Recurrent neural networks reservoir computing Reservoirs Speech emotion recognition Speech recognition Time series Time series analysis time series classification
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	122871
container_issue
container_start_page	122855
container_title	IEEE access
container_volume	9
creator	Ibrahim, Hemin Loo, Chu Kiong Alnajjar, Fady
description	Many researchers are inspired by studying Speech Emotion Recognition (SER) because it is considered as a key effort in Human-Computer Interaction (HCI). The main focus of this work is to design a model for emotion recognition from speech, which has plenty of challenges within it. Due to the time series and sparse nature of emotion in speech, we have adopted a multivariate time series feature representation of the input data. The work has also adopted the Echo State Network (ESN) which includes reservoir computing as a special case of the Recurrent Neural Network (RNN) to avoid model complexity because of its untrained and sparse nature when mapping the features into a higher dimensional space. Additionally, we applied dimensionality reduction since it offers significant computational advantages by using Sparse Random Projection (SRP). Late fusion of bidirectionality input has been applied to capture additional information independently of the input data. The experiments for speaker-independent and/or speaker-dependent were performed on four common speech emotion datasets which are Emo-DB, SAVEE, RAVDESS, and FAU Aibo Emotion Corpus. The results show that the designed model outperforms the state-of-the-art with a cheaper computation cost.
doi_str_mv	10.1109/ACCESS.2021.3107858
format	Article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9522131</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9522131</ieee_id><doaj_id>oai_doaj_org_article_c24cb2f79cb34dedab36ee21759a26db</doaj_id><sourcerecordid>2572667869</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-6e1966c1188ba49127f21f03b7f998e76bc8718cdce7bc348b199242ff8ce2ab3</originalsourceid><addsrcrecordid>eNpNUdtKAzEQXURBUb_Al4DPrTvJbi6PutQLFBSr-BiS7KSmtE1NtoJ_764r4rzM7ZwzDKcoLqCcApTq6rppZovFlJYUpgxKIWt5UJxQ4GrCasYP_9XHxXnOq7IP2Y9qcVL4xQ7RvZPZJnYhbskzurjchp_afpG56ZDc7vPQ-pjITWhDQjeszboHZ0yfMSTSxM1u34XtkryF7p08m20bN-QpxdUIPiuOvFlnPP_Np8Xr7eyluZ_MH-8emuv5xFWl7CYcQXHuAKS0plJAhafgS2aFV0qi4NZJAdK1DoV1rJIWlKIV9V46pMay0-Jh1G2jWeldChuTvnQ0Qf8MYlpqk7rg1qgdrZylXihnWdVi27M5IgVRK0N5O2hdjlq7FD_2mDu9ivvU_501rQXlXEiuehQbUS7FnBP6v6tQ6sEfPfqjB3_0rz8962JkBUT8Y6iaUmDAvgG0mY0K</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2572667869</pqid></control><display><type>article</type><title>Speech Emotion Recognition by Late Fusion for Bidirectional Reservoir Computing With Random Projection</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Ibrahim, Hemin ; Loo, Chu Kiong ; Alnajjar, Fady</creator><creatorcontrib>Ibrahim, Hemin ; Loo, Chu Kiong ; Alnajjar, Fady</creatorcontrib><description>Many researchers are inspired by studying Speech Emotion Recognition (SER) because it is considered as a key effort in Human-Computer Interaction (HCI). The main focus of this work is to design a model for emotion recognition from speech, which has plenty of challenges within it. Due to the time series and sparse nature of emotion in speech, we have adopted a multivariate time series feature representation of the input data. The work has also adopted the Echo State Network (ESN) which includes reservoir computing as a special case of the Recurrent Neural Network (RNN) to avoid model complexity because of its untrained and sparse nature when mapping the features into a higher dimensional space. Additionally, we applied dimensionality reduction since it offers significant computational advantages by using Sparse Random Projection (SRP). Late fusion of bidirectionality input has been applied to capture additional information independently of the input data. The experiments for speaker-independent and/or speaker-dependent were performed on four common speech emotion datasets which are Emo-DB, SAVEE, RAVDESS, and FAU Aibo Emotion Corpus. The results show that the designed model outperforms the state-of-the-art with a cheaper computation cost.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3107858</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Computation ; Emotion recognition ; Emotions ; Feature extraction ; Human computer interaction ; Human-computer interface ; Principal component analysis ; random projection ; recurrent neural network ; Recurrent neural networks ; reservoir computing ; Reservoirs ; Speech emotion recognition ; Speech recognition ; Time series ; Time series analysis ; time series classification</subject><ispartof>IEEE access, 2021, Vol.9, p.122855-122871</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-6e1966c1188ba49127f21f03b7f998e76bc8718cdce7bc348b199242ff8ce2ab3</citedby><cites>FETCH-LOGICAL-c408t-6e1966c1188ba49127f21f03b7f998e76bc8718cdce7bc348b199242ff8ce2ab3</cites><orcidid>0000-0001-7867-2665 ; 0000-0001-7602-6838 ; 0000-0001-6102-3765</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9522131$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Ibrahim, Hemin</creatorcontrib><creatorcontrib>Loo, Chu Kiong</creatorcontrib><creatorcontrib>Alnajjar, Fady</creatorcontrib><title>Speech Emotion Recognition by Late Fusion for Bidirectional Reservoir Computing With Random Projection</title><title>IEEE access</title><addtitle>Access</addtitle><description>Many researchers are inspired by studying Speech Emotion Recognition (SER) because it is considered as a key effort in Human-Computer Interaction (HCI). The main focus of this work is to design a model for emotion recognition from speech, which has plenty of challenges within it. Due to the time series and sparse nature of emotion in speech, we have adopted a multivariate time series feature representation of the input data. The work has also adopted the Echo State Network (ESN) which includes reservoir computing as a special case of the Recurrent Neural Network (RNN) to avoid model complexity because of its untrained and sparse nature when mapping the features into a higher dimensional space. Additionally, we applied dimensionality reduction since it offers significant computational advantages by using Sparse Random Projection (SRP). Late fusion of bidirectionality input has been applied to capture additional information independently of the input data. The experiments for speaker-independent and/or speaker-dependent were performed on four common speech emotion datasets which are Emo-DB, SAVEE, RAVDESS, and FAU Aibo Emotion Corpus. The results show that the designed model outperforms the state-of-the-art with a cheaper computation cost.</description><subject>Computation</subject><subject>Emotion recognition</subject><subject>Emotions</subject><subject>Feature extraction</subject><subject>Human computer interaction</subject><subject>Human-computer interface</subject><subject>Principal component analysis</subject><subject>random projection</subject><subject>recurrent neural network</subject><subject>Recurrent neural networks</subject><subject>reservoir computing</subject><subject>Reservoirs</subject><subject>Speech emotion recognition</subject><subject>Speech recognition</subject><subject>Time series</subject><subject>Time series analysis</subject><subject>time series classification</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUdtKAzEQXURBUb_Al4DPrTvJbi6PutQLFBSr-BiS7KSmtE1NtoJ_764r4rzM7ZwzDKcoLqCcApTq6rppZovFlJYUpgxKIWt5UJxQ4GrCasYP_9XHxXnOq7IP2Y9qcVL4xQ7RvZPZJnYhbskzurjchp_afpG56ZDc7vPQ-pjITWhDQjeszboHZ0yfMSTSxM1u34XtkryF7p08m20bN-QpxdUIPiuOvFlnPP_Np8Xr7eyluZ_MH-8emuv5xFWl7CYcQXHuAKS0plJAhafgS2aFV0qi4NZJAdK1DoV1rJIWlKIV9V46pMay0-Jh1G2jWeldChuTvnQ0Qf8MYlpqk7rg1qgdrZylXihnWdVi27M5IgVRK0N5O2hdjlq7FD_2mDu9ivvU_501rQXlXEiuehQbUS7FnBP6v6tQ6sEfPfqjB3_0rz8962JkBUT8Y6iaUmDAvgG0mY0K</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Ibrahim, Hemin</creator><creator>Loo, Chu Kiong</creator><creator>Alnajjar, Fady</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7867-2665</orcidid><orcidid>https://orcid.org/0000-0001-7602-6838</orcidid><orcidid>https://orcid.org/0000-0001-6102-3765</orcidid></search><sort><creationdate>2021</creationdate><title>Speech Emotion Recognition by Late Fusion for Bidirectional Reservoir Computing With Random Projection</title><author>Ibrahim, Hemin ; Loo, Chu Kiong ; Alnajjar, Fady</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-6e1966c1188ba49127f21f03b7f998e76bc8718cdce7bc348b199242ff8ce2ab3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computation</topic><topic>Emotion recognition</topic><topic>Emotions</topic><topic>Feature extraction</topic><topic>Human computer interaction</topic><topic>Human-computer interface</topic><topic>Principal component analysis</topic><topic>random projection</topic><topic>recurrent neural network</topic><topic>Recurrent neural networks</topic><topic>reservoir computing</topic><topic>Reservoirs</topic><topic>Speech emotion recognition</topic><topic>Speech recognition</topic><topic>Time series</topic><topic>Time series analysis</topic><topic>time series classification</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ibrahim, Hemin</creatorcontrib><creatorcontrib>Loo, Chu Kiong</creatorcontrib><creatorcontrib>Alnajjar, Fady</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ibrahim, Hemin</au><au>Loo, Chu Kiong</au><au>Alnajjar, Fady</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Speech Emotion Recognition by Late Fusion for Bidirectional Reservoir Computing With Random Projection</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>122855</spage><epage>122871</epage><pages>122855-122871</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Many researchers are inspired by studying Speech Emotion Recognition (SER) because it is considered as a key effort in Human-Computer Interaction (HCI). The main focus of this work is to design a model for emotion recognition from speech, which has plenty of challenges within it. Due to the time series and sparse nature of emotion in speech, we have adopted a multivariate time series feature representation of the input data. The work has also adopted the Echo State Network (ESN) which includes reservoir computing as a special case of the Recurrent Neural Network (RNN) to avoid model complexity because of its untrained and sparse nature when mapping the features into a higher dimensional space. Additionally, we applied dimensionality reduction since it offers significant computational advantages by using Sparse Random Projection (SRP). Late fusion of bidirectionality input has been applied to capture additional information independently of the input data. The experiments for speaker-independent and/or speaker-dependent were performed on four common speech emotion datasets which are Emo-DB, SAVEE, RAVDESS, and FAU Aibo Emotion Corpus. The results show that the designed model outperforms the state-of-the-art with a cheaper computation cost.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3107858</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0001-7867-2665</orcidid><orcidid>https://orcid.org/0000-0001-7602-6838</orcidid><orcidid>https://orcid.org/0000-0001-6102-3765</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2021, Vol.9, p.122855-122871
issn	2169-3536 2169-3536
language	eng
recordid	cdi_ieee_primary_9522131
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Computation Emotion recognition Emotions Feature extraction Human computer interaction Human-computer interface Principal component analysis random projection recurrent neural network Recurrent neural networks reservoir computing Reservoirs Speech emotion recognition Speech recognition Time series Time series analysis time series classification
title	Speech Emotion Recognition by Late Fusion for Bidirectional Reservoir Computing With Random Projection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T02%3A37%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Speech%20Emotion%20Recognition%20by%20Late%20Fusion%20for%20Bidirectional%20Reservoir%20Computing%20With%20Random%20Projection&rft.jtitle=IEEE%20access&rft.au=Ibrahim,%20Hemin&rft.date=2021&rft.volume=9&rft.spage=122855&rft.epage=122871&rft.pages=122855-122871&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3107858&rft_dat=%3Cproquest_ieee_%3E2572667869%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2572667869&rft_id=info:pmid/&rft_ieee_id=9522131&rft_doaj_id=oai_doaj_org_article_c24cb2f79cb34dedab36ee21759a26db&rfr_iscdi=true