Formant position based weighted spectral features for emotion recognition

► We introduce WMFCC features for emotion recognition from speech. ► The WMFCC is an early data fusion of spectral content and formant location information. ► We experimentally evaluate WMFCC features and late decision fusion methods. ► The WMFCC features and late fusion provide significant improvem...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Speech communication 2011-12, Vol.53 (9), p.1186-1197
Hauptverfasser: Bozkurt, Elif, Erzin, Engin, Erdem, Çigˇdem Erogˇlu, Erdem, A. Tanju
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1197
container_issue 9
container_start_page 1186
container_title Speech communication
container_volume 53
creator Bozkurt, Elif
Erzin, Engin
Erdem, Çigˇdem Erogˇlu
Erdem, A. Tanju
description ► We introduce WMFCC features for emotion recognition from speech. ► The WMFCC is an early data fusion of spectral content and formant location information. ► We experimentally evaluate WMFCC features and late decision fusion methods. ► The WMFCC features and late fusion provide significant improvements. In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the normalized inverse harmonic mean function of the line spectral frequency (LSF) features, which are known to be localized around formant frequencies. The above approach can be considered as an early data fusion of spectral content and formant location information. We also investigate methods for late decision fusion of unimodal classifiers. We evaluate the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. The results show that unimodal classifiers with the WMFCC features perform significantly better than the classifiers with standard spectral features. Late decision fusion of classifiers provide further significant performance improvements.
doi_str_mv 10.1016/j.specom.2011.04.003
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_914634344</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167639311000641</els_id><sourcerecordid>914297227</sourcerecordid><originalsourceid>FETCH-LOGICAL-c402t-42fe875691e85c4c34eeeaa34c09b7815825a9178f634005daf7e3a7f2def0cb3</originalsourceid><addsrcrecordid>eNqNkT1PwzAQhi0EEqXwDxiyMSWcPxI7CxKqKCAhscBsuc65uGriYqcg_j1uywxMd8PzvCfdS8glhYoCba5XVdqgDX3FgNIKRAXAj8iEKslKSRU7JpOMybLhLT8lZymtAEAoxSbkcR5ib4ax2ITkRx-GYmESdsUn-uXbmJdd8hjNunBoxm3EVLgQC-zDHo757HLYi-fkxJl1woufOSWv87uX2UP59Hz_OLt9Kq0ANpaCOVSyblqKqrbCcoGIxnBhoV1IRWvFatNSqVzDBUDdGSeRG-lYhw7sgk_J1SF3E8P7FtOoe58srtdmwLBNuqUim1yI_5CslYzJP0nVNrQRVEImxYG0MaQU0elN9L2JX5qC3pWhV_pQht6VoUHoXEbWbg4a5s98eIw6WY-Dxc7nF466C_73gG8qPZUL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>896164170</pqid></control><display><type>article</type><title>Formant position based weighted spectral features for emotion recognition</title><source>Elsevier ScienceDirect Journals</source><creator>Bozkurt, Elif ; Erzin, Engin ; Erdem, Çigˇdem Erogˇlu ; Erdem, A. Tanju</creator><creatorcontrib>Bozkurt, Elif ; Erzin, Engin ; Erdem, Çigˇdem Erogˇlu ; Erdem, A. Tanju</creatorcontrib><description>► We introduce WMFCC features for emotion recognition from speech. ► The WMFCC is an early data fusion of spectral content and formant location information. ► We experimentally evaluate WMFCC features and late decision fusion methods. ► The WMFCC features and late fusion provide significant improvements. In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the normalized inverse harmonic mean function of the line spectral frequency (LSF) features, which are known to be localized around formant frequencies. The above approach can be considered as an early data fusion of spectral content and formant location information. We also investigate methods for late decision fusion of unimodal classifiers. We evaluate the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. The results show that unimodal classifiers with the WMFCC features perform significantly better than the classifiers with standard spectral features. Late decision fusion of classifiers provide further significant performance improvements.</description><identifier>ISSN: 0167-6393</identifier><identifier>EISSN: 1872-7182</identifier><identifier>DOI: 10.1016/j.specom.2011.04.003</identifier><identifier>CODEN: SCOMDH</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Classifiers ; Decision fusion ; Emotion recognition ; Emotional speech classification ; Emotions ; Feature recognition ; Formant frequency ; Line spectral frequency ; Recognition ; Spectra ; Spectral features ; Spectral lines ; Speech</subject><ispartof>Speech communication, 2011-12, Vol.53 (9), p.1186-1197</ispartof><rights>2011 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c402t-42fe875691e85c4c34eeeaa34c09b7815825a9178f634005daf7e3a7f2def0cb3</citedby><cites>FETCH-LOGICAL-c402t-42fe875691e85c4c34eeeaa34c09b7815825a9178f634005daf7e3a7f2def0cb3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0167639311000641$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Bozkurt, Elif</creatorcontrib><creatorcontrib>Erzin, Engin</creatorcontrib><creatorcontrib>Erdem, Çigˇdem Erogˇlu</creatorcontrib><creatorcontrib>Erdem, A. Tanju</creatorcontrib><title>Formant position based weighted spectral features for emotion recognition</title><title>Speech communication</title><description>► We introduce WMFCC features for emotion recognition from speech. ► The WMFCC is an early data fusion of spectral content and formant location information. ► We experimentally evaluate WMFCC features and late decision fusion methods. ► The WMFCC features and late fusion provide significant improvements. In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the normalized inverse harmonic mean function of the line spectral frequency (LSF) features, which are known to be localized around formant frequencies. The above approach can be considered as an early data fusion of spectral content and formant location information. We also investigate methods for late decision fusion of unimodal classifiers. We evaluate the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. The results show that unimodal classifiers with the WMFCC features perform significantly better than the classifiers with standard spectral features. Late decision fusion of classifiers provide further significant performance improvements.</description><subject>Classifiers</subject><subject>Decision fusion</subject><subject>Emotion recognition</subject><subject>Emotional speech classification</subject><subject>Emotions</subject><subject>Feature recognition</subject><subject>Formant frequency</subject><subject>Line spectral frequency</subject><subject>Recognition</subject><subject>Spectra</subject><subject>Spectral features</subject><subject>Spectral lines</subject><subject>Speech</subject><issn>0167-6393</issn><issn>1872-7182</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><recordid>eNqNkT1PwzAQhi0EEqXwDxiyMSWcPxI7CxKqKCAhscBsuc65uGriYqcg_j1uywxMd8PzvCfdS8glhYoCba5XVdqgDX3FgNIKRAXAj8iEKslKSRU7JpOMybLhLT8lZymtAEAoxSbkcR5ib4ax2ITkRx-GYmESdsUn-uXbmJdd8hjNunBoxm3EVLgQC-zDHo757HLYi-fkxJl1woufOSWv87uX2UP59Hz_OLt9Kq0ANpaCOVSyblqKqrbCcoGIxnBhoV1IRWvFatNSqVzDBUDdGSeRG-lYhw7sgk_J1SF3E8P7FtOoe58srtdmwLBNuqUim1yI_5CslYzJP0nVNrQRVEImxYG0MaQU0elN9L2JX5qC3pWhV_pQht6VoUHoXEbWbg4a5s98eIw6WY-Dxc7nF466C_73gG8qPZUL</recordid><startdate>20111201</startdate><enddate>20111201</enddate><creator>Bozkurt, Elif</creator><creator>Erzin, Engin</creator><creator>Erdem, Çigˇdem Erogˇlu</creator><creator>Erdem, A. Tanju</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><scope>8BM</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20111201</creationdate><title>Formant position based weighted spectral features for emotion recognition</title><author>Bozkurt, Elif ; Erzin, Engin ; Erdem, Çigˇdem Erogˇlu ; Erdem, A. Tanju</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c402t-42fe875691e85c4c34eeeaa34c09b7815825a9178f634005daf7e3a7f2def0cb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Classifiers</topic><topic>Decision fusion</topic><topic>Emotion recognition</topic><topic>Emotional speech classification</topic><topic>Emotions</topic><topic>Feature recognition</topic><topic>Formant frequency</topic><topic>Line spectral frequency</topic><topic>Recognition</topic><topic>Spectra</topic><topic>Spectral features</topic><topic>Spectral lines</topic><topic>Speech</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bozkurt, Elif</creatorcontrib><creatorcontrib>Erzin, Engin</creatorcontrib><creatorcontrib>Erdem, Çigˇdem Erogˇlu</creatorcontrib><creatorcontrib>Erdem, A. Tanju</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ComDisDome</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Speech communication</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bozkurt, Elif</au><au>Erzin, Engin</au><au>Erdem, Çigˇdem Erogˇlu</au><au>Erdem, A. Tanju</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Formant position based weighted spectral features for emotion recognition</atitle><jtitle>Speech communication</jtitle><date>2011-12-01</date><risdate>2011</risdate><volume>53</volume><issue>9</issue><spage>1186</spage><epage>1197</epage><pages>1186-1197</pages><issn>0167-6393</issn><eissn>1872-7182</eissn><coden>SCOMDH</coden><abstract>► We introduce WMFCC features for emotion recognition from speech. ► The WMFCC is an early data fusion of spectral content and formant location information. ► We experimentally evaluate WMFCC features and late decision fusion methods. ► The WMFCC features and late fusion provide significant improvements. In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the normalized inverse harmonic mean function of the line spectral frequency (LSF) features, which are known to be localized around formant frequencies. The above approach can be considered as an early data fusion of spectral content and formant location information. We also investigate methods for late decision fusion of unimodal classifiers. We evaluate the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. The results show that unimodal classifiers with the WMFCC features perform significantly better than the classifiers with standard spectral features. Late decision fusion of classifiers provide further significant performance improvements.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.specom.2011.04.003</doi><tpages>12</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0167-6393
ispartof Speech communication, 2011-12, Vol.53 (9), p.1186-1197
issn 0167-6393
1872-7182
language eng
recordid cdi_proquest_miscellaneous_914634344
source Elsevier ScienceDirect Journals
subjects Classifiers
Decision fusion
Emotion recognition
Emotional speech classification
Emotions
Feature recognition
Formant frequency
Line spectral frequency
Recognition
Spectra
Spectral features
Spectral lines
Speech
title Formant position based weighted spectral features for emotion recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T15%3A37%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Formant%20position%20based%20weighted%20spectral%20features%20for%20emotion%20recognition&rft.jtitle=Speech%20communication&rft.au=Bozkurt,%20Elif&rft.date=2011-12-01&rft.volume=53&rft.issue=9&rft.spage=1186&rft.epage=1197&rft.pages=1186-1197&rft.issn=0167-6393&rft.eissn=1872-7182&rft.coden=SCOMDH&rft_id=info:doi/10.1016/j.specom.2011.04.003&rft_dat=%3Cproquest_cross%3E914297227%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=896164170&rft_id=info:pmid/&rft_els_id=S0167639311000641&rfr_iscdi=true