Neural network based feature transformation for emotion independent speaker identification
In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions...
Gespeichert in:
Veröffentlicht in: | International journal of speech technology 2012-09, Vol.15 (3), p.335-349 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 349 |
---|---|
container_issue | 3 |
container_start_page | 335 |
container_title | International journal of speech technology |
container_volume | 15 |
creator | Krothapalli, Sreenivasa Rao Yadav, Jaynath Sarkar, Sourjya Koolagudi, Shashidhar G. Vuppala, Anil Kumar |
description | In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions during conversations for effectively conveying the messages. Therefore, in this work we propose the speaker recognition system, robust to variations in emotional moods of speakers. Neural network models are explored to transform the speaker specific spectral features from any specific emotion to neutral. In this work, we have considered eight emotions namely, Anger, Sad, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise. The emotional databases developed in Hindi, Telugu and German are used in this work for analyzing the effect of proposed feature transformation on the performance of speaker identification system. In this work, spectral features are represented by mel-frequency cepstral coefficients, and speaker models are developed using Gaussian mixture models. Performance of the speaker identification system is analyzed with various feature mapping techniques. Results have demonstrated that the proposed neural network based feature transformation has improved the speaker identification performance by 20 %. Feature transformation at the syllable level has shown the better performance, compared to sentence level. |
doi_str_mv | 10.1007/s10772-012-9148-2 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1551118216</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1536164526</sourcerecordid><originalsourceid>FETCH-LOGICAL-c453t-bd44b856e4aef90c6f20988268b561fa8dfb963a253df50fc9c9013420b680d83</originalsourceid><addsrcrecordid>eNqNkTlPxEAMhSMEEsvxA-hS0gTsuTIp0YpLWkEDDc1oknhQdnMxkwjx75ndpUbb2M_W91z4JckVwg0C5LcBIc9ZBsiyAoXO2FGyQBk3GhGOo-YaMyZQnSZnIawBoMgLtkg-Xmj2tk17mr4Hv0lLG6hOHdlp9pRO3vbBDb6zUzP0aVQpdcNON31NI8XST2kYyW7Ip812alxT7fCL5MTZNtDlXz9P3h_u35ZP2er18Xl5t8oqIfmUlbUQpZaKhCVXQKUcg0JrpnQpFTqra1cWilsmee0kuKqoCkAuGJRKQ635eXK9vzv64WumMJmuCRW1re1pmINBKRFRM1QHoFyhEpIdgArFQWmuIKK4Rys_hODJmdE3nfU_BsFs0zH7dExMx2zTMSx62N4TItt_kjfrYfZ9_NM_pl_Fl5Kl</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1463068360</pqid></control><display><type>article</type><title>Neural network based feature transformation for emotion independent speaker identification</title><source>Springer Nature - Complete Springer Journals</source><creator>Krothapalli, Sreenivasa Rao ; Yadav, Jaynath ; Sarkar, Sourjya ; Koolagudi, Shashidhar G. ; Vuppala, Anil Kumar</creator><creatorcontrib>Krothapalli, Sreenivasa Rao ; Yadav, Jaynath ; Sarkar, Sourjya ; Koolagudi, Shashidhar G. ; Vuppala, Anil Kumar</creatorcontrib><description>In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions during conversations for effectively conveying the messages. Therefore, in this work we propose the speaker recognition system, robust to variations in emotional moods of speakers. Neural network models are explored to transform the speaker specific spectral features from any specific emotion to neutral. In this work, we have considered eight emotions namely, Anger, Sad, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise. The emotional databases developed in Hindi, Telugu and German are used in this work for analyzing the effect of proposed feature transformation on the performance of speaker identification system. In this work, spectral features are represented by mel-frequency cepstral coefficients, and speaker models are developed using Gaussian mixture models. Performance of the speaker identification system is analyzed with various feature mapping techniques. Results have demonstrated that the proposed neural network based feature transformation has improved the speaker identification performance by 20 %. Feature transformation at the syllable level has shown the better performance, compared to sentence level.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-012-9148-2</identifier><identifier>CODEN: ISTEFM</identifier><language>eng</language><publisher>Boston: Springer US</publisher><subject>Artificial Intelligence ; Conveying ; Emotions ; Engineering ; Moods ; Neural networks ; Signal,Image and Speech Processing ; Social Sciences ; Spectra ; Speech recognition ; Syllables ; Transformations</subject><ispartof>International journal of speech technology, 2012-09, Vol.15 (3), p.335-349</ispartof><rights>Springer Science+Business Media, LLC 2012</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c453t-bd44b856e4aef90c6f20988268b561fa8dfb963a253df50fc9c9013420b680d83</citedby><cites>FETCH-LOGICAL-c453t-bd44b856e4aef90c6f20988268b561fa8dfb963a253df50fc9c9013420b680d83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-012-9148-2$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-012-9148-2$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Krothapalli, Sreenivasa Rao</creatorcontrib><creatorcontrib>Yadav, Jaynath</creatorcontrib><creatorcontrib>Sarkar, Sourjya</creatorcontrib><creatorcontrib>Koolagudi, Shashidhar G.</creatorcontrib><creatorcontrib>Vuppala, Anil Kumar</creatorcontrib><title>Neural network based feature transformation for emotion independent speaker identification</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions during conversations for effectively conveying the messages. Therefore, in this work we propose the speaker recognition system, robust to variations in emotional moods of speakers. Neural network models are explored to transform the speaker specific spectral features from any specific emotion to neutral. In this work, we have considered eight emotions namely, Anger, Sad, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise. The emotional databases developed in Hindi, Telugu and German are used in this work for analyzing the effect of proposed feature transformation on the performance of speaker identification system. In this work, spectral features are represented by mel-frequency cepstral coefficients, and speaker models are developed using Gaussian mixture models. Performance of the speaker identification system is analyzed with various feature mapping techniques. Results have demonstrated that the proposed neural network based feature transformation has improved the speaker identification performance by 20 %. Feature transformation at the syllable level has shown the better performance, compared to sentence level.</description><subject>Artificial Intelligence</subject><subject>Conveying</subject><subject>Emotions</subject><subject>Engineering</subject><subject>Moods</subject><subject>Neural networks</subject><subject>Signal,Image and Speech Processing</subject><subject>Social Sciences</subject><subject>Spectra</subject><subject>Speech recognition</subject><subject>Syllables</subject><subject>Transformations</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNqNkTlPxEAMhSMEEsvxA-hS0gTsuTIp0YpLWkEDDc1oknhQdnMxkwjx75ndpUbb2M_W91z4JckVwg0C5LcBIc9ZBsiyAoXO2FGyQBk3GhGOo-YaMyZQnSZnIawBoMgLtkg-Xmj2tk17mr4Hv0lLG6hOHdlp9pRO3vbBDb6zUzP0aVQpdcNON31NI8XST2kYyW7Ip812alxT7fCL5MTZNtDlXz9P3h_u35ZP2er18Xl5t8oqIfmUlbUQpZaKhCVXQKUcg0JrpnQpFTqra1cWilsmee0kuKqoCkAuGJRKQ635eXK9vzv64WumMJmuCRW1re1pmINBKRFRM1QHoFyhEpIdgArFQWmuIKK4Rys_hODJmdE3nfU_BsFs0zH7dExMx2zTMSx62N4TItt_kjfrYfZ9_NM_pl_Fl5Kl</recordid><startdate>20120901</startdate><enddate>20120901</enddate><creator>Krothapalli, Sreenivasa Rao</creator><creator>Yadav, Jaynath</creator><creator>Sarkar, Sourjya</creator><creator>Koolagudi, Shashidhar G.</creator><creator>Vuppala, Anil Kumar</creator><general>Springer US</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><scope>8BM</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20120901</creationdate><title>Neural network based feature transformation for emotion independent speaker identification</title><author>Krothapalli, Sreenivasa Rao ; Yadav, Jaynath ; Sarkar, Sourjya ; Koolagudi, Shashidhar G. ; Vuppala, Anil Kumar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c453t-bd44b856e4aef90c6f20988268b561fa8dfb963a253df50fc9c9013420b680d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Artificial Intelligence</topic><topic>Conveying</topic><topic>Emotions</topic><topic>Engineering</topic><topic>Moods</topic><topic>Neural networks</topic><topic>Signal,Image and Speech Processing</topic><topic>Social Sciences</topic><topic>Spectra</topic><topic>Speech recognition</topic><topic>Syllables</topic><topic>Transformations</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Krothapalli, Sreenivasa Rao</creatorcontrib><creatorcontrib>Yadav, Jaynath</creatorcontrib><creatorcontrib>Sarkar, Sourjya</creatorcontrib><creatorcontrib>Koolagudi, Shashidhar G.</creatorcontrib><creatorcontrib>Vuppala, Anil Kumar</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ComDisDome</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Krothapalli, Sreenivasa Rao</au><au>Yadav, Jaynath</au><au>Sarkar, Sourjya</au><au>Koolagudi, Shashidhar G.</au><au>Vuppala, Anil Kumar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Neural network based feature transformation for emotion independent speaker identification</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2012-09-01</date><risdate>2012</risdate><volume>15</volume><issue>3</issue><spage>335</spage><epage>349</epage><pages>335-349</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><coden>ISTEFM</coden><abstract>In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions during conversations for effectively conveying the messages. Therefore, in this work we propose the speaker recognition system, robust to variations in emotional moods of speakers. Neural network models are explored to transform the speaker specific spectral features from any specific emotion to neutral. In this work, we have considered eight emotions namely, Anger, Sad, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise. The emotional databases developed in Hindi, Telugu and German are used in this work for analyzing the effect of proposed feature transformation on the performance of speaker identification system. In this work, spectral features are represented by mel-frequency cepstral coefficients, and speaker models are developed using Gaussian mixture models. Performance of the speaker identification system is analyzed with various feature mapping techniques. Results have demonstrated that the proposed neural network based feature transformation has improved the speaker identification performance by 20 %. Feature transformation at the syllable level has shown the better performance, compared to sentence level.</abstract><cop>Boston</cop><pub>Springer US</pub><doi>10.1007/s10772-012-9148-2</doi><tpages>15</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1381-2416 |
ispartof | International journal of speech technology, 2012-09, Vol.15 (3), p.335-349 |
issn | 1381-2416 1572-8110 |
language | eng |
recordid | cdi_proquest_miscellaneous_1551118216 |
source | Springer Nature - Complete Springer Journals |
subjects | Artificial Intelligence Conveying Emotions Engineering Moods Neural networks Signal,Image and Speech Processing Social Sciences Spectra Speech recognition Syllables Transformations |
title | Neural network based feature transformation for emotion independent speaker identification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T01%3A58%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Neural%20network%20based%20feature%20transformation%20for%20emotion%20independent%20speaker%20identification&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Krothapalli,%20Sreenivasa%20Rao&rft.date=2012-09-01&rft.volume=15&rft.issue=3&rft.spage=335&rft.epage=349&rft.pages=335-349&rft.issn=1381-2416&rft.eissn=1572-8110&rft.coden=ISTEFM&rft_id=info:doi/10.1007/s10772-012-9148-2&rft_dat=%3Cproquest_cross%3E1536164526%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1463068360&rft_id=info:pmid/&rfr_iscdi=true |