Formant position based weighted spectral features for emotion recognition
► We introduce WMFCC features for emotion recognition from speech. ► The WMFCC is an early data fusion of spectral content and formant location information. ► We experimentally evaluate WMFCC features and late decision fusion methods. ► The WMFCC features and late fusion provide significant improvem...
Gespeichert in:
Veröffentlicht in: | Speech communication 2011-12, Vol.53 (9), p.1186-1197 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1197 |
---|---|
container_issue | 9 |
container_start_page | 1186 |
container_title | Speech communication |
container_volume | 53 |
creator | Bozkurt, Elif Erzin, Engin Erdem, Çigˇdem Erogˇlu Erdem, A. Tanju |
description | ► We introduce WMFCC features for emotion recognition from speech. ► The WMFCC is an early data fusion of spectral content and formant location information. ► We experimentally evaluate WMFCC features and late decision fusion methods. ► The WMFCC features and late fusion provide significant improvements.
In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the normalized inverse harmonic mean function of the line spectral frequency (LSF) features, which are known to be localized around formant frequencies. The above approach can be considered as an early data fusion of spectral content and formant location information. We also investigate methods for late decision fusion of unimodal classifiers. We evaluate the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. The results show that unimodal classifiers with the WMFCC features perform significantly better than the classifiers with standard spectral features. Late decision fusion of classifiers provide further significant performance improvements. |
doi_str_mv | 10.1016/j.specom.2011.04.003 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_914634344</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167639311000641</els_id><sourcerecordid>914297227</sourcerecordid><originalsourceid>FETCH-LOGICAL-c402t-42fe875691e85c4c34eeeaa34c09b7815825a9178f634005daf7e3a7f2def0cb3</originalsourceid><addsrcrecordid>eNqNkT1PwzAQhi0EEqXwDxiyMSWcPxI7CxKqKCAhscBsuc65uGriYqcg_j1uywxMd8PzvCfdS8glhYoCba5XVdqgDX3FgNIKRAXAj8iEKslKSRU7JpOMybLhLT8lZymtAEAoxSbkcR5ib4ax2ITkRx-GYmESdsUn-uXbmJdd8hjNunBoxm3EVLgQC-zDHo757HLYi-fkxJl1woufOSWv87uX2UP59Hz_OLt9Kq0ANpaCOVSyblqKqrbCcoGIxnBhoV1IRWvFatNSqVzDBUDdGSeRG-lYhw7sgk_J1SF3E8P7FtOoe58srtdmwLBNuqUim1yI_5CslYzJP0nVNrQRVEImxYG0MaQU0elN9L2JX5qC3pWhV_pQht6VoUHoXEbWbg4a5s98eIw6WY-Dxc7nF466C_73gG8qPZUL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>896164170</pqid></control><display><type>article</type><title>Formant position based weighted spectral features for emotion recognition</title><source>Elsevier ScienceDirect Journals</source><creator>Bozkurt, Elif ; Erzin, Engin ; Erdem, Çigˇdem Erogˇlu ; Erdem, A. Tanju</creator><creatorcontrib>Bozkurt, Elif ; Erzin, Engin ; Erdem, Çigˇdem Erogˇlu ; Erdem, A. Tanju</creatorcontrib><description>► We introduce WMFCC features for emotion recognition from speech. ► The WMFCC is an early data fusion of spectral content and formant location information. ► We experimentally evaluate WMFCC features and late decision fusion methods. ► The WMFCC features and late fusion provide significant improvements.
In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the normalized inverse harmonic mean function of the line spectral frequency (LSF) features, which are known to be localized around formant frequencies. The above approach can be considered as an early data fusion of spectral content and formant location information. We also investigate methods for late decision fusion of unimodal classifiers. We evaluate the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. The results show that unimodal classifiers with the WMFCC features perform significantly better than the classifiers with standard spectral features. Late decision fusion of classifiers provide further significant performance improvements.</description><identifier>ISSN: 0167-6393</identifier><identifier>EISSN: 1872-7182</identifier><identifier>DOI: 10.1016/j.specom.2011.04.003</identifier><identifier>CODEN: SCOMDH</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Classifiers ; Decision fusion ; Emotion recognition ; Emotional speech classification ; Emotions ; Feature recognition ; Formant frequency ; Line spectral frequency ; Recognition ; Spectra ; Spectral features ; Spectral lines ; Speech</subject><ispartof>Speech communication, 2011-12, Vol.53 (9), p.1186-1197</ispartof><rights>2011 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c402t-42fe875691e85c4c34eeeaa34c09b7815825a9178f634005daf7e3a7f2def0cb3</citedby><cites>FETCH-LOGICAL-c402t-42fe875691e85c4c34eeeaa34c09b7815825a9178f634005daf7e3a7f2def0cb3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0167639311000641$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Bozkurt, Elif</creatorcontrib><creatorcontrib>Erzin, Engin</creatorcontrib><creatorcontrib>Erdem, Çigˇdem Erogˇlu</creatorcontrib><creatorcontrib>Erdem, A. Tanju</creatorcontrib><title>Formant position based weighted spectral features for emotion recognition</title><title>Speech communication</title><description>► We introduce WMFCC features for emotion recognition from speech. ► The WMFCC is an early data fusion of spectral content and formant location information. ► We experimentally evaluate WMFCC features and late decision fusion methods. ► The WMFCC features and late fusion provide significant improvements.
In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the normalized inverse harmonic mean function of the line spectral frequency (LSF) features, which are known to be localized around formant frequencies. The above approach can be considered as an early data fusion of spectral content and formant location information. We also investigate methods for late decision fusion of unimodal classifiers. We evaluate the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. The results show that unimodal classifiers with the WMFCC features perform significantly better than the classifiers with standard spectral features. Late decision fusion of classifiers provide further significant performance improvements.</description><subject>Classifiers</subject><subject>Decision fusion</subject><subject>Emotion recognition</subject><subject>Emotional speech classification</subject><subject>Emotions</subject><subject>Feature recognition</subject><subject>Formant frequency</subject><subject>Line spectral frequency</subject><subject>Recognition</subject><subject>Spectra</subject><subject>Spectral features</subject><subject>Spectral lines</subject><subject>Speech</subject><issn>0167-6393</issn><issn>1872-7182</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><recordid>eNqNkT1PwzAQhi0EEqXwDxiyMSWcPxI7CxKqKCAhscBsuc65uGriYqcg_j1uywxMd8PzvCfdS8glhYoCba5XVdqgDX3FgNIKRAXAj8iEKslKSRU7JpOMybLhLT8lZymtAEAoxSbkcR5ib4ax2ITkRx-GYmESdsUn-uXbmJdd8hjNunBoxm3EVLgQC-zDHo757HLYi-fkxJl1woufOSWv87uX2UP59Hz_OLt9Kq0ANpaCOVSyblqKqrbCcoGIxnBhoV1IRWvFatNSqVzDBUDdGSeRG-lYhw7sgk_J1SF3E8P7FtOoe58srtdmwLBNuqUim1yI_5CslYzJP0nVNrQRVEImxYG0MaQU0elN9L2JX5qC3pWhV_pQht6VoUHoXEbWbg4a5s98eIw6WY-Dxc7nF466C_73gG8qPZUL</recordid><startdate>20111201</startdate><enddate>20111201</enddate><creator>Bozkurt, Elif</creator><creator>Erzin, Engin</creator><creator>Erdem, Çigˇdem Erogˇlu</creator><creator>Erdem, A. Tanju</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><scope>8BM</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20111201</creationdate><title>Formant position based weighted spectral features for emotion recognition</title><author>Bozkurt, Elif ; Erzin, Engin ; Erdem, Çigˇdem Erogˇlu ; Erdem, A. Tanju</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c402t-42fe875691e85c4c34eeeaa34c09b7815825a9178f634005daf7e3a7f2def0cb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Classifiers</topic><topic>Decision fusion</topic><topic>Emotion recognition</topic><topic>Emotional speech classification</topic><topic>Emotions</topic><topic>Feature recognition</topic><topic>Formant frequency</topic><topic>Line spectral frequency</topic><topic>Recognition</topic><topic>Spectra</topic><topic>Spectral features</topic><topic>Spectral lines</topic><topic>Speech</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bozkurt, Elif</creatorcontrib><creatorcontrib>Erzin, Engin</creatorcontrib><creatorcontrib>Erdem, Çigˇdem Erogˇlu</creatorcontrib><creatorcontrib>Erdem, A. Tanju</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ComDisDome</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Speech communication</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bozkurt, Elif</au><au>Erzin, Engin</au><au>Erdem, Çigˇdem Erogˇlu</au><au>Erdem, A. Tanju</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Formant position based weighted spectral features for emotion recognition</atitle><jtitle>Speech communication</jtitle><date>2011-12-01</date><risdate>2011</risdate><volume>53</volume><issue>9</issue><spage>1186</spage><epage>1197</epage><pages>1186-1197</pages><issn>0167-6393</issn><eissn>1872-7182</eissn><coden>SCOMDH</coden><abstract>► We introduce WMFCC features for emotion recognition from speech. ► The WMFCC is an early data fusion of spectral content and formant location information. ► We experimentally evaluate WMFCC features and late decision fusion methods. ► The WMFCC features and late fusion provide significant improvements.
In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the normalized inverse harmonic mean function of the line spectral frequency (LSF) features, which are known to be localized around formant frequencies. The above approach can be considered as an early data fusion of spectral content and formant location information. We also investigate methods for late decision fusion of unimodal classifiers. We evaluate the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. The results show that unimodal classifiers with the WMFCC features perform significantly better than the classifiers with standard spectral features. Late decision fusion of classifiers provide further significant performance improvements.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.specom.2011.04.003</doi><tpages>12</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0167-6393 |
ispartof | Speech communication, 2011-12, Vol.53 (9), p.1186-1197 |
issn | 0167-6393 1872-7182 |
language | eng |
recordid | cdi_proquest_miscellaneous_914634344 |
source | Elsevier ScienceDirect Journals |
subjects | Classifiers Decision fusion Emotion recognition Emotional speech classification Emotions Feature recognition Formant frequency Line spectral frequency Recognition Spectra Spectral features Spectral lines Speech |
title | Formant position based weighted spectral features for emotion recognition |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T15%3A37%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Formant%20position%20based%20weighted%20spectral%20features%20for%20emotion%20recognition&rft.jtitle=Speech%20communication&rft.au=Bozkurt,%20Elif&rft.date=2011-12-01&rft.volume=53&rft.issue=9&rft.spage=1186&rft.epage=1197&rft.pages=1186-1197&rft.issn=0167-6393&rft.eissn=1872-7182&rft.coden=SCOMDH&rft_id=info:doi/10.1016/j.specom.2011.04.003&rft_dat=%3Cproquest_cross%3E914297227%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=896164170&rft_id=info:pmid/&rft_els_id=S0167639311000641&rfr_iscdi=true |