An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques

Speech is one of the communication processes of humans. One of the important features of speech is to convey the inner feelings of the person to the listener. When a speech is expressed by the speaker, this speech also contains the feelings of the person, which leads to the creation of thoughts and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Wireless personal communications 2024, Vol.134 (2), p.735-753
Hauptverfasser:	Al-Dujaili Al-Khazraji, Mohammed Jawad, Ebrahimi-Moghadam, Abbas
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Communications Engineering Computer Communication Networks Diagnostic systems Emotion recognition Emotional factors Emotions Engineering English language Markov chains Networks Principal components analysis Redundancy Signal,Image and Speech Processing Speech Speech recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	753
container_issue	2
container_start_page	735
container_title	Wireless personal communications
container_volume	134
creator	Al-Dujaili Al-Khazraji, Mohammed Jawad Ebrahimi-Moghadam, Abbas
description	Speech is one of the communication processes of humans. One of the important features of speech is to convey the inner feelings of the person to the listener. When a speech is expressed by the speaker, this speech also contains the feelings of the person, which leads to the creation of thoughts and behaviors appropriate to oneself. Speech Emotion Recognition (SER) is a very important issue in the field of human–machine interaction. The expansion of the use of computers and its impact on today's life has caused this mutual cooperation between man and machine to be widely investigated and researched. In this article, SER in English and Persian has been examined. Frequency time characteristics such as Mel- Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding and Predictive Linear Perceptual (PLP) are extracted from the data as feature vectors, then they are combined with each other and a selection of suitable features from them. Also, Principal components analysis (PCA) is used to reduce dimensions and eliminate redundancy while retaining most of the intrinsic information content of the pattern. Then, each emotional state was classified using the Gaussian Mixtures Model (GMM) and Hidden Markov Model (HMM) technique. Combining the MFCC + PLP properties, PCA features, and HMM classification with a precision of 88.85% and a runtime of 0.3 s produces the average diagnostic rate in the English database; similarly, the PLP properties, PCA features, and HMM classification with a precision of 90.21% and a runtime of 0.4 s produce the average diagnostic rate in the Persian database. Based on the combination of features and classifications, the experimental results demonstrated that the suggested approach can attain a high level of stable detection performance for every emotional state.
doi_str_mv	10.1007/s11277-024-10918-6
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3039356878</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3039356878</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-1d19f86b4bb9312a7a12cc4317a101a24aef5f144c595449430c6360ab35ac03</originalsourceid><addsrcrecordid>eNp9kEtLAzEUhYMoWKt_wFXA9WhuknlkWUtf0CLYCu5CJpNpp7RJTaYF_71pR3Dn6pzFd87lHoQegTwDIflLAKB5nhDKEyACiiS7Qj1Ic5oUjH9eox4RVCQZBXqL7kLYEhJjgvaQH1g8s9adVNucDF6YduMqXDuPlwdj9AYvm7VVOzzau7ZxFr8b7da2ufhXFUyFo4mobn2kxka1R28C_giNXePJYoGVrfA06iqW2ebraMI9uqnVLpiHX-2j1Xi0Gk6T-dtkNhzME01z0iZQgaiLrORlKRhQlSugWnMG0RBQlCtTpzVwrlORci44IzpjGVElS5UmrI-eutqDd-ezrdy6o4-_BMkIEyzNiryIFO0o7V0I3tTy4Ju98t8SiDxPK7tpZZxWXqaVWQyxLhQibNfG_1X_k_oBQYp7LA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3039356878</pqid></control><display><type>article</type><title>An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques</title><source>SpringerNature Journals</source><creator>Al-Dujaili Al-Khazraji, Mohammed Jawad ; Ebrahimi-Moghadam, Abbas</creator><creatorcontrib>Al-Dujaili Al-Khazraji, Mohammed Jawad ; Ebrahimi-Moghadam, Abbas</creatorcontrib><description>Speech is one of the communication processes of humans. One of the important features of speech is to convey the inner feelings of the person to the listener. When a speech is expressed by the speaker, this speech also contains the feelings of the person, which leads to the creation of thoughts and behaviors appropriate to oneself. Speech Emotion Recognition (SER) is a very important issue in the field of human–machine interaction. The expansion of the use of computers and its impact on today's life has caused this mutual cooperation between man and machine to be widely investigated and researched. In this article, SER in English and Persian has been examined. Frequency time characteristics such as Mel- Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding and Predictive Linear Perceptual (PLP) are extracted from the data as feature vectors, then they are combined with each other and a selection of suitable features from them. Also, Principal components analysis (PCA) is used to reduce dimensions and eliminate redundancy while retaining most of the intrinsic information content of the pattern. Then, each emotional state was classified using the Gaussian Mixtures Model (GMM) and Hidden Markov Model (HMM) technique. Combining the MFCC + PLP properties, PCA features, and HMM classification with a precision of 88.85% and a runtime of 0.3 s produces the average diagnostic rate in the English database; similarly, the PLP properties, PCA features, and HMM classification with a precision of 90.21% and a runtime of 0.4 s produce the average diagnostic rate in the Persian database. Based on the combination of features and classifications, the experimental results demonstrated that the suggested approach can attain a high level of stable detection performance for every emotional state.</description><identifier>ISSN: 0929-6212</identifier><identifier>EISSN: 1572-834X</identifier><identifier>DOI: 10.1007/s11277-024-10918-6</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Classification ; Communications Engineering ; Computer Communication Networks ; Diagnostic systems ; Emotion recognition ; Emotional factors ; Emotions ; Engineering ; English language ; Markov chains ; Networks ; Principal components analysis ; Redundancy ; Signal,Image and Speech Processing ; Speech ; Speech recognition</subject><ispartof>Wireless personal communications, 2024, Vol.134 (2), p.735-753</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-1d19f86b4bb9312a7a12cc4317a101a24aef5f144c595449430c6360ab35ac03</cites><orcidid>0000-0002-3804-6667</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11277-024-10918-6$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11277-024-10918-6$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>315,782,786,27931,27932,41495,42564,51326</link.rule.ids></links><search><creatorcontrib>Al-Dujaili Al-Khazraji, Mohammed Jawad</creatorcontrib><creatorcontrib>Ebrahimi-Moghadam, Abbas</creatorcontrib><title>An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques</title><title>Wireless personal communications</title><addtitle>Wireless Pers Commun</addtitle><description>Speech is one of the communication processes of humans. One of the important features of speech is to convey the inner feelings of the person to the listener. When a speech is expressed by the speaker, this speech also contains the feelings of the person, which leads to the creation of thoughts and behaviors appropriate to oneself. Speech Emotion Recognition (SER) is a very important issue in the field of human–machine interaction. The expansion of the use of computers and its impact on today's life has caused this mutual cooperation between man and machine to be widely investigated and researched. In this article, SER in English and Persian has been examined. Frequency time characteristics such as Mel- Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding and Predictive Linear Perceptual (PLP) are extracted from the data as feature vectors, then they are combined with each other and a selection of suitable features from them. Also, Principal components analysis (PCA) is used to reduce dimensions and eliminate redundancy while retaining most of the intrinsic information content of the pattern. Then, each emotional state was classified using the Gaussian Mixtures Model (GMM) and Hidden Markov Model (HMM) technique. Combining the MFCC + PLP properties, PCA features, and HMM classification with a precision of 88.85% and a runtime of 0.3 s produces the average diagnostic rate in the English database; similarly, the PLP properties, PCA features, and HMM classification with a precision of 90.21% and a runtime of 0.4 s produce the average diagnostic rate in the Persian database. Based on the combination of features and classifications, the experimental results demonstrated that the suggested approach can attain a high level of stable detection performance for every emotional state.</description><subject>Classification</subject><subject>Communications Engineering</subject><subject>Computer Communication Networks</subject><subject>Diagnostic systems</subject><subject>Emotion recognition</subject><subject>Emotional factors</subject><subject>Emotions</subject><subject>Engineering</subject><subject>English language</subject><subject>Markov chains</subject><subject>Networks</subject><subject>Principal components analysis</subject><subject>Redundancy</subject><subject>Signal,Image and Speech Processing</subject><subject>Speech</subject><subject>Speech recognition</subject><issn>0929-6212</issn><issn>1572-834X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLAzEUhYMoWKt_wFXA9WhuknlkWUtf0CLYCu5CJpNpp7RJTaYF_71pR3Dn6pzFd87lHoQegTwDIflLAKB5nhDKEyACiiS7Qj1Ic5oUjH9eox4RVCQZBXqL7kLYEhJjgvaQH1g8s9adVNucDF6YduMqXDuPlwdj9AYvm7VVOzzau7ZxFr8b7da2ufhXFUyFo4mobn2kxka1R28C_giNXePJYoGVrfA06iqW2ebraMI9uqnVLpiHX-2j1Xi0Gk6T-dtkNhzME01z0iZQgaiLrORlKRhQlSugWnMG0RBQlCtTpzVwrlORci44IzpjGVElS5UmrI-eutqDd-ezrdy6o4-_BMkIEyzNiryIFO0o7V0I3tTy4Ju98t8SiDxPK7tpZZxWXqaVWQyxLhQibNfG_1X_k_oBQYp7LA</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Al-Dujaili Al-Khazraji, Mohammed Jawad</creator><creator>Ebrahimi-Moghadam, Abbas</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-3804-6667</orcidid></search><sort><creationdate>2024</creationdate><title>An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques</title><author>Al-Dujaili Al-Khazraji, Mohammed Jawad ; Ebrahimi-Moghadam, Abbas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-1d19f86b4bb9312a7a12cc4317a101a24aef5f144c595449430c6360ab35ac03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classification</topic><topic>Communications Engineering</topic><topic>Computer Communication Networks</topic><topic>Diagnostic systems</topic><topic>Emotion recognition</topic><topic>Emotional factors</topic><topic>Emotions</topic><topic>Engineering</topic><topic>English language</topic><topic>Markov chains</topic><topic>Networks</topic><topic>Principal components analysis</topic><topic>Redundancy</topic><topic>Signal,Image and Speech Processing</topic><topic>Speech</topic><topic>Speech recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Al-Dujaili Al-Khazraji, Mohammed Jawad</creatorcontrib><creatorcontrib>Ebrahimi-Moghadam, Abbas</creatorcontrib><collection>CrossRef</collection><jtitle>Wireless personal communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Al-Dujaili Al-Khazraji, Mohammed Jawad</au><au>Ebrahimi-Moghadam, Abbas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques</atitle><jtitle>Wireless personal communications</jtitle><stitle>Wireless Pers Commun</stitle><date>2024</date><risdate>2024</risdate><volume>134</volume><issue>2</issue><spage>735</spage><epage>753</epage><pages>735-753</pages><issn>0929-6212</issn><eissn>1572-834X</eissn><abstract>Speech is one of the communication processes of humans. One of the important features of speech is to convey the inner feelings of the person to the listener. When a speech is expressed by the speaker, this speech also contains the feelings of the person, which leads to the creation of thoughts and behaviors appropriate to oneself. Speech Emotion Recognition (SER) is a very important issue in the field of human–machine interaction. The expansion of the use of computers and its impact on today's life has caused this mutual cooperation between man and machine to be widely investigated and researched. In this article, SER in English and Persian has been examined. Frequency time characteristics such as Mel- Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding and Predictive Linear Perceptual (PLP) are extracted from the data as feature vectors, then they are combined with each other and a selection of suitable features from them. Also, Principal components analysis (PCA) is used to reduce dimensions and eliminate redundancy while retaining most of the intrinsic information content of the pattern. Then, each emotional state was classified using the Gaussian Mixtures Model (GMM) and Hidden Markov Model (HMM) technique. Combining the MFCC + PLP properties, PCA features, and HMM classification with a precision of 88.85% and a runtime of 0.3 s produces the average diagnostic rate in the English database; similarly, the PLP properties, PCA features, and HMM classification with a precision of 90.21% and a runtime of 0.4 s produce the average diagnostic rate in the Persian database. Based on the combination of features and classifications, the experimental results demonstrated that the suggested approach can attain a high level of stable detection performance for every emotional state.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11277-024-10918-6</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0002-3804-6667</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0929-6212
ispartof	Wireless personal communications, 2024, Vol.134 (2), p.735-753
issn	0929-6212 1572-834X
language	eng
recordid	cdi_proquest_journals_3039356878
source	SpringerNature Journals
subjects	Classification Communications Engineering Computer Communication Networks Diagnostic systems Emotion recognition Emotional factors Emotions Engineering English language Markov chains Networks Principal components analysis Redundancy Signal,Image and Speech Processing Speech Speech recognition
title	An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T05%3A56%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Innovative%20Method%20for%20Speech%20Signal%20Emotion%20Recognition%20Based%20on%20Spectral%20Features%20Using%20GMM%20and%20HMM%20Techniques&rft.jtitle=Wireless%20personal%20communications&rft.au=Al-Dujaili%20Al-Khazraji,%20Mohammed%20Jawad&rft.date=2024&rft.volume=134&rft.issue=2&rft.spage=735&rft.epage=753&rft.pages=735-753&rft.issn=0929-6212&rft.eissn=1572-834X&rft_id=info:doi/10.1007/s11277-024-10918-6&rft_dat=%3Cproquest_cross%3E3039356878%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3039356878&rft_id=info:pmid/&rfr_iscdi=true