Real-Time Speech Emotion Analysis for Smart Home Assistants

Artificial Intelligence (AI) based Speech Emotion Recognition (SER) has been widely used in the consumer field for control of smart home personal assistants, with many such devices on the market. However, with the increase in computational power, connectivity, and the need to enable people to live i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on consumer electronics 2021-02, Vol.67 (1), p.68-76
Hauptverfasser: Chatterjee, Rajdeep, Mazumdar, Saptarshi, Sherratt, R. Simon, Halder, Rohit, Maitra, Tanmoy, Giri, Debasis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 76
container_issue 1
container_start_page 68
container_title IEEE transactions on consumer electronics
container_volume 67
creator Chatterjee, Rajdeep
Mazumdar, Saptarshi
Sherratt, R. Simon
Halder, Rohit
Maitra, Tanmoy
Giri, Debasis
description Artificial Intelligence (AI) based Speech Emotion Recognition (SER) has been widely used in the consumer field for control of smart home personal assistants, with many such devices on the market. However, with the increase in computational power, connectivity, and the need to enable people to live in the home for longer though the use of technology, then smart home assistants that could detect human emotion will improve the communication between a user and the assistant enabling the assistant of offer more productive feedback. Thus, the aim of this work is to analyze emotional states in speech and propose a suitable method considering performance verses complexity for deployment in Consumer Electronics home products, and to present a practical live demonstration of the research. In this article, a comprehensive approach has been introduced for the human speech-based emotion analysis. The 1-D convolutional neural network (CNN) has been implemented to learn and classify the emotions associated with human speech. The paper has been implemented on the standard datasets (emotion classification) Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and Toronto Emotional Speech Set database (TESS) (Young and Old). The proposed approach gives 90.48%, 95.79% and 94.47% classification accuracies in the aforementioned datasets. We conclude that the 1-D CNN classification models used in speaker-independent experiments are highly effective in the automatic prediction of emotion and are ideal for deployment in smart home assistants to detect emotion.
doi_str_mv 10.1109/TCE.2021.3056421
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9352018</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9352018</ieee_id><sourcerecordid>2494376675</sourcerecordid><originalsourceid>FETCH-LOGICAL-c446t-b3c027f9034f7b3c48974feb11c4434d546b50b9c92fd042aaf059348049d8553</originalsourceid><addsrcrecordid>eNo9kNFLwzAQh4MoOKfvgi8FnzsvyaVN8KmM6YSB4OZzSLsEO9amJt3D_nszNnw67u6748dHyCOFGaWgXjbzxYwBozMOokBGr8iECiFzpKy8JhMAJXMOBb8ldzHuACgKJifk9cuafb5pO5utB2ubn2zR-bH1fVb1Zn-MbcycD9m6M2HMlj5hVUzD0fRjvCc3zuyjfbjUKfl-W2zmy3z1-f4xr1Z5g1iMec0bYKVTwNGVqUGpSnS2pjTtOW4FFrWAWjWKuS0gM8aBUBwloNpKIfiUPJ__DsH_Hmwc9c4fQooXNUOFvCyK8kTBmWqCjzFYp4fQpthHTUGfFOmkSJ8U6YuidPJ0Pmmttf-44oIBlfwPRtBfnA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2494376675</pqid></control><display><type>article</type><title>Real-Time Speech Emotion Analysis for Smart Home Assistants</title><source>IEEE Electronic Library (IEL)</source><creator>Chatterjee, Rajdeep ; Mazumdar, Saptarshi ; Sherratt, R. Simon ; Halder, Rohit ; Maitra, Tanmoy ; Giri, Debasis</creator><creatorcontrib>Chatterjee, Rajdeep ; Mazumdar, Saptarshi ; Sherratt, R. Simon ; Halder, Rohit ; Maitra, Tanmoy ; Giri, Debasis</creatorcontrib><description>Artificial Intelligence (AI) based Speech Emotion Recognition (SER) has been widely used in the consumer field for control of smart home personal assistants, with many such devices on the market. However, with the increase in computational power, connectivity, and the need to enable people to live in the home for longer though the use of technology, then smart home assistants that could detect human emotion will improve the communication between a user and the assistant enabling the assistant of offer more productive feedback. Thus, the aim of this work is to analyze emotional states in speech and propose a suitable method considering performance verses complexity for deployment in Consumer Electronics home products, and to present a practical live demonstration of the research. In this article, a comprehensive approach has been introduced for the human speech-based emotion analysis. The 1-D convolutional neural network (CNN) has been implemented to learn and classify the emotions associated with human speech. The paper has been implemented on the standard datasets (emotion classification) Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and Toronto Emotional Speech Set database (TESS) (Young and Old). The proposed approach gives 90.48%, 95.79% and 94.47% classification accuracies in the aforementioned datasets. We conclude that the 1-D CNN classification models used in speaker-independent experiments are highly effective in the automatic prediction of emotion and are ideal for deployment in smart home assistants to detect emotion.</description><identifier>ISSN: 0098-3063</identifier><identifier>EISSN: 1558-4127</identifier><identifier>DOI: 10.1109/TCE.2021.3056421</identifier><identifier>CODEN: ITCEDA</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Artificial intelligence ; Artificial neural networks ; Classification ; convolutional neural network ; Covariance matrices ; Datasets ; Emotion recognition ; emotion recognition system ; Emotional factors ; Emotions ; Feature extraction ; Filter banks ; Hidden Markov models ; Household goods ; Mel frequency cepstral coefficient ; Psychology ; Smart buildings ; smart home assistants ; Smart homes ; Speech ; Speech recognition</subject><ispartof>IEEE transactions on consumer electronics, 2021-02, Vol.67 (1), p.68-76</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c446t-b3c027f9034f7b3c48974feb11c4434d546b50b9c92fd042aaf059348049d8553</citedby><cites>FETCH-LOGICAL-c446t-b3c027f9034f7b3c48974feb11c4434d546b50b9c92fd042aaf059348049d8553</cites><orcidid>0000-0002-6703-9159 ; 0000-0003-3033-3036 ; 0000-0002-5887-649X ; 0000-0002-2836-2236 ; 0000-0001-7899-4445 ; 0000-0003-0125-4830</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9352018$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9352018$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chatterjee, Rajdeep</creatorcontrib><creatorcontrib>Mazumdar, Saptarshi</creatorcontrib><creatorcontrib>Sherratt, R. Simon</creatorcontrib><creatorcontrib>Halder, Rohit</creatorcontrib><creatorcontrib>Maitra, Tanmoy</creatorcontrib><creatorcontrib>Giri, Debasis</creatorcontrib><title>Real-Time Speech Emotion Analysis for Smart Home Assistants</title><title>IEEE transactions on consumer electronics</title><addtitle>T-CE</addtitle><description>Artificial Intelligence (AI) based Speech Emotion Recognition (SER) has been widely used in the consumer field for control of smart home personal assistants, with many such devices on the market. However, with the increase in computational power, connectivity, and the need to enable people to live in the home for longer though the use of technology, then smart home assistants that could detect human emotion will improve the communication between a user and the assistant enabling the assistant of offer more productive feedback. Thus, the aim of this work is to analyze emotional states in speech and propose a suitable method considering performance verses complexity for deployment in Consumer Electronics home products, and to present a practical live demonstration of the research. In this article, a comprehensive approach has been introduced for the human speech-based emotion analysis. The 1-D convolutional neural network (CNN) has been implemented to learn and classify the emotions associated with human speech. The paper has been implemented on the standard datasets (emotion classification) Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and Toronto Emotional Speech Set database (TESS) (Young and Old). The proposed approach gives 90.48%, 95.79% and 94.47% classification accuracies in the aforementioned datasets. We conclude that the 1-D CNN classification models used in speaker-independent experiments are highly effective in the automatic prediction of emotion and are ideal for deployment in smart home assistants to detect emotion.</description><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Classification</subject><subject>convolutional neural network</subject><subject>Covariance matrices</subject><subject>Datasets</subject><subject>Emotion recognition</subject><subject>emotion recognition system</subject><subject>Emotional factors</subject><subject>Emotions</subject><subject>Feature extraction</subject><subject>Filter banks</subject><subject>Hidden Markov models</subject><subject>Household goods</subject><subject>Mel frequency cepstral coefficient</subject><subject>Psychology</subject><subject>Smart buildings</subject><subject>smart home assistants</subject><subject>Smart homes</subject><subject>Speech</subject><subject>Speech recognition</subject><issn>0098-3063</issn><issn>1558-4127</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kNFLwzAQh4MoOKfvgi8FnzsvyaVN8KmM6YSB4OZzSLsEO9amJt3D_nszNnw67u6748dHyCOFGaWgXjbzxYwBozMOokBGr8iECiFzpKy8JhMAJXMOBb8ldzHuACgKJifk9cuafb5pO5utB2ubn2zR-bH1fVb1Zn-MbcycD9m6M2HMlj5hVUzD0fRjvCc3zuyjfbjUKfl-W2zmy3z1-f4xr1Z5g1iMec0bYKVTwNGVqUGpSnS2pjTtOW4FFrWAWjWKuS0gM8aBUBwloNpKIfiUPJ__DsH_Hmwc9c4fQooXNUOFvCyK8kTBmWqCjzFYp4fQpthHTUGfFOmkSJ8U6YuidPJ0Pmmttf-44oIBlfwPRtBfnA</recordid><startdate>20210201</startdate><enddate>20210201</enddate><creator>Chatterjee, Rajdeep</creator><creator>Mazumdar, Saptarshi</creator><creator>Sherratt, R. Simon</creator><creator>Halder, Rohit</creator><creator>Maitra, Tanmoy</creator><creator>Giri, Debasis</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0002-6703-9159</orcidid><orcidid>https://orcid.org/0000-0003-3033-3036</orcidid><orcidid>https://orcid.org/0000-0002-5887-649X</orcidid><orcidid>https://orcid.org/0000-0002-2836-2236</orcidid><orcidid>https://orcid.org/0000-0001-7899-4445</orcidid><orcidid>https://orcid.org/0000-0003-0125-4830</orcidid></search><sort><creationdate>20210201</creationdate><title>Real-Time Speech Emotion Analysis for Smart Home Assistants</title><author>Chatterjee, Rajdeep ; Mazumdar, Saptarshi ; Sherratt, R. Simon ; Halder, Rohit ; Maitra, Tanmoy ; Giri, Debasis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c446t-b3c027f9034f7b3c48974feb11c4434d546b50b9c92fd042aaf059348049d8553</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Classification</topic><topic>convolutional neural network</topic><topic>Covariance matrices</topic><topic>Datasets</topic><topic>Emotion recognition</topic><topic>emotion recognition system</topic><topic>Emotional factors</topic><topic>Emotions</topic><topic>Feature extraction</topic><topic>Filter banks</topic><topic>Hidden Markov models</topic><topic>Household goods</topic><topic>Mel frequency cepstral coefficient</topic><topic>Psychology</topic><topic>Smart buildings</topic><topic>smart home assistants</topic><topic>Smart homes</topic><topic>Speech</topic><topic>Speech recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chatterjee, Rajdeep</creatorcontrib><creatorcontrib>Mazumdar, Saptarshi</creatorcontrib><creatorcontrib>Sherratt, R. Simon</creatorcontrib><creatorcontrib>Halder, Rohit</creatorcontrib><creatorcontrib>Maitra, Tanmoy</creatorcontrib><creatorcontrib>Giri, Debasis</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on consumer electronics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chatterjee, Rajdeep</au><au>Mazumdar, Saptarshi</au><au>Sherratt, R. Simon</au><au>Halder, Rohit</au><au>Maitra, Tanmoy</au><au>Giri, Debasis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Real-Time Speech Emotion Analysis for Smart Home Assistants</atitle><jtitle>IEEE transactions on consumer electronics</jtitle><stitle>T-CE</stitle><date>2021-02-01</date><risdate>2021</risdate><volume>67</volume><issue>1</issue><spage>68</spage><epage>76</epage><pages>68-76</pages><issn>0098-3063</issn><eissn>1558-4127</eissn><coden>ITCEDA</coden><abstract>Artificial Intelligence (AI) based Speech Emotion Recognition (SER) has been widely used in the consumer field for control of smart home personal assistants, with many such devices on the market. However, with the increase in computational power, connectivity, and the need to enable people to live in the home for longer though the use of technology, then smart home assistants that could detect human emotion will improve the communication between a user and the assistant enabling the assistant of offer more productive feedback. Thus, the aim of this work is to analyze emotional states in speech and propose a suitable method considering performance verses complexity for deployment in Consumer Electronics home products, and to present a practical live demonstration of the research. In this article, a comprehensive approach has been introduced for the human speech-based emotion analysis. The 1-D convolutional neural network (CNN) has been implemented to learn and classify the emotions associated with human speech. The paper has been implemented on the standard datasets (emotion classification) Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and Toronto Emotional Speech Set database (TESS) (Young and Old). The proposed approach gives 90.48%, 95.79% and 94.47% classification accuracies in the aforementioned datasets. We conclude that the 1-D CNN classification models used in speaker-independent experiments are highly effective in the automatic prediction of emotion and are ideal for deployment in smart home assistants to detect emotion.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCE.2021.3056421</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-6703-9159</orcidid><orcidid>https://orcid.org/0000-0003-3033-3036</orcidid><orcidid>https://orcid.org/0000-0002-5887-649X</orcidid><orcidid>https://orcid.org/0000-0002-2836-2236</orcidid><orcidid>https://orcid.org/0000-0001-7899-4445</orcidid><orcidid>https://orcid.org/0000-0003-0125-4830</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0098-3063
ispartof IEEE transactions on consumer electronics, 2021-02, Vol.67 (1), p.68-76
issn 0098-3063
1558-4127
language eng
recordid cdi_ieee_primary_9352018
source IEEE Electronic Library (IEL)
subjects Artificial intelligence
Artificial neural networks
Classification
convolutional neural network
Covariance matrices
Datasets
Emotion recognition
emotion recognition system
Emotional factors
Emotions
Feature extraction
Filter banks
Hidden Markov models
Household goods
Mel frequency cepstral coefficient
Psychology
Smart buildings
smart home assistants
Smart homes
Speech
Speech recognition
title Real-Time Speech Emotion Analysis for Smart Home Assistants
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T01%3A35%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Real-Time%20Speech%20Emotion%20Analysis%20for%20Smart%20Home%20Assistants&rft.jtitle=IEEE%20transactions%20on%20consumer%20electronics&rft.au=Chatterjee,%20Rajdeep&rft.date=2021-02-01&rft.volume=67&rft.issue=1&rft.spage=68&rft.epage=76&rft.pages=68-76&rft.issn=0098-3063&rft.eissn=1558-4127&rft.coden=ITCEDA&rft_id=info:doi/10.1109/TCE.2021.3056421&rft_dat=%3Cproquest_RIE%3E2494376675%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2494376675&rft_id=info:pmid/&rft_ieee_id=9352018&rfr_iscdi=true