Neural network based feature transformation for emotion independent speaker identification

In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of speech technology 2012-09, Vol.15 (3), p.335-349
Hauptverfasser: Krothapalli, Sreenivasa Rao, Yadav, Jaynath, Sarkar, Sourjya, Koolagudi, Shashidhar G., Vuppala, Anil Kumar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 349
container_issue 3
container_start_page 335
container_title International journal of speech technology
container_volume 15
creator Krothapalli, Sreenivasa Rao
Yadav, Jaynath
Sarkar, Sourjya
Koolagudi, Shashidhar G.
Vuppala, Anil Kumar
description In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions during conversations for effectively conveying the messages. Therefore, in this work we propose the speaker recognition system, robust to variations in emotional moods of speakers. Neural network models are explored to transform the speaker specific spectral features from any specific emotion to neutral. In this work, we have considered eight emotions namely, Anger, Sad, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise. The emotional databases developed in Hindi, Telugu and German are used in this work for analyzing the effect of proposed feature transformation on the performance of speaker identification system. In this work, spectral features are represented by mel-frequency cepstral coefficients, and speaker models are developed using Gaussian mixture models. Performance of the speaker identification system is analyzed with various feature mapping techniques. Results have demonstrated that the proposed neural network based feature transformation has improved the speaker identification performance by 20 %. Feature transformation at the syllable level has shown the better performance, compared to sentence level.
doi_str_mv 10.1007/s10772-012-9148-2
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1551118216</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1536164526</sourcerecordid><originalsourceid>FETCH-LOGICAL-c453t-bd44b856e4aef90c6f20988268b561fa8dfb963a253df50fc9c9013420b680d83</originalsourceid><addsrcrecordid>eNqNkTlPxEAMhSMEEsvxA-hS0gTsuTIp0YpLWkEDDc1oknhQdnMxkwjx75ndpUbb2M_W91z4JckVwg0C5LcBIc9ZBsiyAoXO2FGyQBk3GhGOo-YaMyZQnSZnIawBoMgLtkg-Xmj2tk17mr4Hv0lLG6hOHdlp9pRO3vbBDb6zUzP0aVQpdcNON31NI8XST2kYyW7Ip812alxT7fCL5MTZNtDlXz9P3h_u35ZP2er18Xl5t8oqIfmUlbUQpZaKhCVXQKUcg0JrpnQpFTqra1cWilsmee0kuKqoCkAuGJRKQ635eXK9vzv64WumMJmuCRW1re1pmINBKRFRM1QHoFyhEpIdgArFQWmuIKK4Rys_hODJmdE3nfU_BsFs0zH7dExMx2zTMSx62N4TItt_kjfrYfZ9_NM_pl_Fl5Kl</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1463068360</pqid></control><display><type>article</type><title>Neural network based feature transformation for emotion independent speaker identification</title><source>Springer Nature - Complete Springer Journals</source><creator>Krothapalli, Sreenivasa Rao ; Yadav, Jaynath ; Sarkar, Sourjya ; Koolagudi, Shashidhar G. ; Vuppala, Anil Kumar</creator><creatorcontrib>Krothapalli, Sreenivasa Rao ; Yadav, Jaynath ; Sarkar, Sourjya ; Koolagudi, Shashidhar G. ; Vuppala, Anil Kumar</creatorcontrib><description>In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions during conversations for effectively conveying the messages. Therefore, in this work we propose the speaker recognition system, robust to variations in emotional moods of speakers. Neural network models are explored to transform the speaker specific spectral features from any specific emotion to neutral. In this work, we have considered eight emotions namely, Anger, Sad, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise. The emotional databases developed in Hindi, Telugu and German are used in this work for analyzing the effect of proposed feature transformation on the performance of speaker identification system. In this work, spectral features are represented by mel-frequency cepstral coefficients, and speaker models are developed using Gaussian mixture models. Performance of the speaker identification system is analyzed with various feature mapping techniques. Results have demonstrated that the proposed neural network based feature transformation has improved the speaker identification performance by 20 %. Feature transformation at the syllable level has shown the better performance, compared to sentence level.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-012-9148-2</identifier><identifier>CODEN: ISTEFM</identifier><language>eng</language><publisher>Boston: Springer US</publisher><subject>Artificial Intelligence ; Conveying ; Emotions ; Engineering ; Moods ; Neural networks ; Signal,Image and Speech Processing ; Social Sciences ; Spectra ; Speech recognition ; Syllables ; Transformations</subject><ispartof>International journal of speech technology, 2012-09, Vol.15 (3), p.335-349</ispartof><rights>Springer Science+Business Media, LLC 2012</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c453t-bd44b856e4aef90c6f20988268b561fa8dfb963a253df50fc9c9013420b680d83</citedby><cites>FETCH-LOGICAL-c453t-bd44b856e4aef90c6f20988268b561fa8dfb963a253df50fc9c9013420b680d83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-012-9148-2$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-012-9148-2$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Krothapalli, Sreenivasa Rao</creatorcontrib><creatorcontrib>Yadav, Jaynath</creatorcontrib><creatorcontrib>Sarkar, Sourjya</creatorcontrib><creatorcontrib>Koolagudi, Shashidhar G.</creatorcontrib><creatorcontrib>Vuppala, Anil Kumar</creatorcontrib><title>Neural network based feature transformation for emotion independent speaker identification</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions during conversations for effectively conveying the messages. Therefore, in this work we propose the speaker recognition system, robust to variations in emotional moods of speakers. Neural network models are explored to transform the speaker specific spectral features from any specific emotion to neutral. In this work, we have considered eight emotions namely, Anger, Sad, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise. The emotional databases developed in Hindi, Telugu and German are used in this work for analyzing the effect of proposed feature transformation on the performance of speaker identification system. In this work, spectral features are represented by mel-frequency cepstral coefficients, and speaker models are developed using Gaussian mixture models. Performance of the speaker identification system is analyzed with various feature mapping techniques. Results have demonstrated that the proposed neural network based feature transformation has improved the speaker identification performance by 20 %. Feature transformation at the syllable level has shown the better performance, compared to sentence level.</description><subject>Artificial Intelligence</subject><subject>Conveying</subject><subject>Emotions</subject><subject>Engineering</subject><subject>Moods</subject><subject>Neural networks</subject><subject>Signal,Image and Speech Processing</subject><subject>Social Sciences</subject><subject>Spectra</subject><subject>Speech recognition</subject><subject>Syllables</subject><subject>Transformations</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNqNkTlPxEAMhSMEEsvxA-hS0gTsuTIp0YpLWkEDDc1oknhQdnMxkwjx75ndpUbb2M_W91z4JckVwg0C5LcBIc9ZBsiyAoXO2FGyQBk3GhGOo-YaMyZQnSZnIawBoMgLtkg-Xmj2tk17mr4Hv0lLG6hOHdlp9pRO3vbBDb6zUzP0aVQpdcNON31NI8XST2kYyW7Ip812alxT7fCL5MTZNtDlXz9P3h_u35ZP2er18Xl5t8oqIfmUlbUQpZaKhCVXQKUcg0JrpnQpFTqra1cWilsmee0kuKqoCkAuGJRKQ635eXK9vzv64WumMJmuCRW1re1pmINBKRFRM1QHoFyhEpIdgArFQWmuIKK4Rys_hODJmdE3nfU_BsFs0zH7dExMx2zTMSx62N4TItt_kjfrYfZ9_NM_pl_Fl5Kl</recordid><startdate>20120901</startdate><enddate>20120901</enddate><creator>Krothapalli, Sreenivasa Rao</creator><creator>Yadav, Jaynath</creator><creator>Sarkar, Sourjya</creator><creator>Koolagudi, Shashidhar G.</creator><creator>Vuppala, Anil Kumar</creator><general>Springer US</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><scope>8BM</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20120901</creationdate><title>Neural network based feature transformation for emotion independent speaker identification</title><author>Krothapalli, Sreenivasa Rao ; Yadav, Jaynath ; Sarkar, Sourjya ; Koolagudi, Shashidhar G. ; Vuppala, Anil Kumar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c453t-bd44b856e4aef90c6f20988268b561fa8dfb963a253df50fc9c9013420b680d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Artificial Intelligence</topic><topic>Conveying</topic><topic>Emotions</topic><topic>Engineering</topic><topic>Moods</topic><topic>Neural networks</topic><topic>Signal,Image and Speech Processing</topic><topic>Social Sciences</topic><topic>Spectra</topic><topic>Speech recognition</topic><topic>Syllables</topic><topic>Transformations</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Krothapalli, Sreenivasa Rao</creatorcontrib><creatorcontrib>Yadav, Jaynath</creatorcontrib><creatorcontrib>Sarkar, Sourjya</creatorcontrib><creatorcontrib>Koolagudi, Shashidhar G.</creatorcontrib><creatorcontrib>Vuppala, Anil Kumar</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ComDisDome</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Krothapalli, Sreenivasa Rao</au><au>Yadav, Jaynath</au><au>Sarkar, Sourjya</au><au>Koolagudi, Shashidhar G.</au><au>Vuppala, Anil Kumar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Neural network based feature transformation for emotion independent speaker identification</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2012-09-01</date><risdate>2012</risdate><volume>15</volume><issue>3</issue><spage>335</spage><epage>349</epage><pages>335-349</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><coden>ISTEFM</coden><abstract>In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions during conversations for effectively conveying the messages. Therefore, in this work we propose the speaker recognition system, robust to variations in emotional moods of speakers. Neural network models are explored to transform the speaker specific spectral features from any specific emotion to neutral. In this work, we have considered eight emotions namely, Anger, Sad, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise. The emotional databases developed in Hindi, Telugu and German are used in this work for analyzing the effect of proposed feature transformation on the performance of speaker identification system. In this work, spectral features are represented by mel-frequency cepstral coefficients, and speaker models are developed using Gaussian mixture models. Performance of the speaker identification system is analyzed with various feature mapping techniques. Results have demonstrated that the proposed neural network based feature transformation has improved the speaker identification performance by 20 %. Feature transformation at the syllable level has shown the better performance, compared to sentence level.</abstract><cop>Boston</cop><pub>Springer US</pub><doi>10.1007/s10772-012-9148-2</doi><tpages>15</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1381-2416
ispartof International journal of speech technology, 2012-09, Vol.15 (3), p.335-349
issn 1381-2416
1572-8110
language eng
recordid cdi_proquest_miscellaneous_1551118216
source Springer Nature - Complete Springer Journals
subjects Artificial Intelligence
Conveying
Emotions
Engineering
Moods
Neural networks
Signal,Image and Speech Processing
Social Sciences
Spectra
Speech recognition
Syllables
Transformations
title Neural network based feature transformation for emotion independent speaker identification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T01%3A58%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Neural%20network%20based%20feature%20transformation%20for%20emotion%20independent%20speaker%20identification&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Krothapalli,%20Sreenivasa%20Rao&rft.date=2012-09-01&rft.volume=15&rft.issue=3&rft.spage=335&rft.epage=349&rft.pages=335-349&rft.issn=1381-2416&rft.eissn=1572-8110&rft.coden=ISTEFM&rft_id=info:doi/10.1007/s10772-012-9148-2&rft_dat=%3Cproquest_cross%3E1536164526%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1463068360&rft_id=info:pmid/&rfr_iscdi=true