Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks
Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capab...
Gespeichert in:
Veröffentlicht in: | IEEE access 2018-01, Vol.6, p.22524-22530 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 22530 |
---|---|
container_issue | |
container_start_page | 22524 |
container_title | IEEE access |
container_volume | 6 |
creator | Zazo, Ruben Sankar Nidadavolu, Phani Chen, Nanxin Gonzalez-Rodriguez, Joaquin Dehak, Najim |
description | Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system. |
doi_str_mv | 10.1109/ACCESS.2018.2816163 |
format | Article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_8316819</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8316819</ieee_id><doaj_id>oai_doaj_org_article_cef807ed01b6401f8b0ae5794dc5f2d6</doaj_id><sourcerecordid>2455932719</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-8c8ffaec8708997ab0765b0dc1e2085bbb52a9a65a30311329a15c9f88dedc133</originalsourceid><addsrcrecordid>eNpNkV9LwzAUxYsoKOon2EvA5878adLkcY6pg6lo3XNI01vtnM1MUsRvb2ZFzMsJ4XfOveFk2YTgKSFYXc7m80VVTSkmckolEUSwg-yEEqFyxpk4_Hc_zs5D2OB0Eqd4eZI9zl4ALULs3k3sXI-6HlWvzkdU7QDsK1rHCN70FgK6MgEalJhV9XyHnsAO3kMf0T0M3myTxE_n38JZdtSabYDzXz3N1teL5_ltvnq4Wc5nq9wWWMZcWtm2BqwssVSqNDUuBa9xYwlQLHld15waZQQ3DDNCGFWGcKtaKRtIEGOn2XLMbZzZ6J1PP_Bf2plO_zw4_6KNj53dgrbQSlxCg0ktCkxaWWMDvFRFY3lLG5GyLsasnXcfA4SoN27wfVpf04JzxWhJVKLYSFnvQvDQ_k0lWO-r0GMVel-F_q0iuSajqwOAP4dkRMiU-Q02d4P1</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455932719</pqid></control><display><type>article</type><title>Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Zazo, Ruben ; Sankar Nidadavolu, Phani ; Chen, Nanxin ; Gonzalez-Rodriguez, Joaquin ; Dehak, Najim</creator><creatorcontrib>Zazo, Ruben ; Sankar Nidadavolu, Phani ; Chen, Nanxin ; Gonzalez-Rodriguez, Joaquin ; Dehak, Najim</creatorcontrib><description>Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2018.2816163</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Age ; Automatic age estimation ; Chronology ; DNN ; Estimation ; Logic gates ; LSTM ; Neural networks ; NIST ; Real time ; Real-time systems ; Recurrent neural networks ; RNN ; Speech ; Speech recognition ; Time response ; Voice activity detectors ; Voice recognition</subject><ispartof>IEEE access, 2018-01, Vol.6, p.22524-22530</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-8c8ffaec8708997ab0765b0dc1e2085bbb52a9a65a30311329a15c9f88dedc133</citedby><cites>FETCH-LOGICAL-c408t-8c8ffaec8708997ab0765b0dc1e2085bbb52a9a65a30311329a15c9f88dedc133</cites><orcidid>0000-0002-4251-1445</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8316819$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,27610,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Zazo, Ruben</creatorcontrib><creatorcontrib>Sankar Nidadavolu, Phani</creatorcontrib><creatorcontrib>Chen, Nanxin</creatorcontrib><creatorcontrib>Gonzalez-Rodriguez, Joaquin</creatorcontrib><creatorcontrib>Dehak, Najim</creatorcontrib><title>Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks</title><title>IEEE access</title><addtitle>Access</addtitle><description>Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.</description><subject>Age</subject><subject>Automatic age estimation</subject><subject>Chronology</subject><subject>DNN</subject><subject>Estimation</subject><subject>Logic gates</subject><subject>LSTM</subject><subject>Neural networks</subject><subject>NIST</subject><subject>Real time</subject><subject>Real-time systems</subject><subject>Recurrent neural networks</subject><subject>RNN</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Time response</subject><subject>Voice activity detectors</subject><subject>Voice recognition</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkV9LwzAUxYsoKOon2EvA5878adLkcY6pg6lo3XNI01vtnM1MUsRvb2ZFzMsJ4XfOveFk2YTgKSFYXc7m80VVTSkmckolEUSwg-yEEqFyxpk4_Hc_zs5D2OB0Eqd4eZI9zl4ALULs3k3sXI-6HlWvzkdU7QDsK1rHCN70FgK6MgEalJhV9XyHnsAO3kMf0T0M3myTxE_n38JZdtSabYDzXz3N1teL5_ltvnq4Wc5nq9wWWMZcWtm2BqwssVSqNDUuBa9xYwlQLHld15waZQQ3DDNCGFWGcKtaKRtIEGOn2XLMbZzZ6J1PP_Bf2plO_zw4_6KNj53dgrbQSlxCg0ktCkxaWWMDvFRFY3lLG5GyLsasnXcfA4SoN27wfVpf04JzxWhJVKLYSFnvQvDQ_k0lWO-r0GMVel-F_q0iuSajqwOAP4dkRMiU-Q02d4P1</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Zazo, Ruben</creator><creator>Sankar Nidadavolu, Phani</creator><creator>Chen, Nanxin</creator><creator>Gonzalez-Rodriguez, Joaquin</creator><creator>Dehak, Najim</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-4251-1445</orcidid></search><sort><creationdate>20180101</creationdate><title>Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks</title><author>Zazo, Ruben ; Sankar Nidadavolu, Phani ; Chen, Nanxin ; Gonzalez-Rodriguez, Joaquin ; Dehak, Najim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-8c8ffaec8708997ab0765b0dc1e2085bbb52a9a65a30311329a15c9f88dedc133</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Age</topic><topic>Automatic age estimation</topic><topic>Chronology</topic><topic>DNN</topic><topic>Estimation</topic><topic>Logic gates</topic><topic>LSTM</topic><topic>Neural networks</topic><topic>NIST</topic><topic>Real time</topic><topic>Real-time systems</topic><topic>Recurrent neural networks</topic><topic>RNN</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Time response</topic><topic>Voice activity detectors</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zazo, Ruben</creatorcontrib><creatorcontrib>Sankar Nidadavolu, Phani</creatorcontrib><creatorcontrib>Chen, Nanxin</creatorcontrib><creatorcontrib>Gonzalez-Rodriguez, Joaquin</creatorcontrib><creatorcontrib>Dehak, Najim</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zazo, Ruben</au><au>Sankar Nidadavolu, Phani</au><au>Chen, Nanxin</au><au>Gonzalez-Rodriguez, Joaquin</au><au>Dehak, Najim</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2018-01-01</date><risdate>2018</risdate><volume>6</volume><spage>22524</spage><epage>22530</epage><pages>22524-22530</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2018.2816163</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0002-4251-1445</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2018-01, Vol.6, p.22524-22530 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_ieee_primary_8316819 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Age Automatic age estimation Chronology DNN Estimation Logic gates LSTM Neural networks NIST Real time Real-time systems Recurrent neural networks RNN Speech Speech recognition Time response Voice activity detectors Voice recognition |
title | Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T10%3A43%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Age%20Estimation%20in%20Short%20Speech%20Utterances%20Based%20on%20LSTM%20Recurrent%20Neural%20Networks&rft.jtitle=IEEE%20access&rft.au=Zazo,%20Ruben&rft.date=2018-01-01&rft.volume=6&rft.spage=22524&rft.epage=22530&rft.pages=22524-22530&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2018.2816163&rft_dat=%3Cproquest_ieee_%3E2455932719%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2455932719&rft_id=info:pmid/&rft_ieee_id=8316819&rft_doaj_id=oai_doaj_org_article_cef807ed01b6401f8b0ae5794dc5f2d6&rfr_iscdi=true |