Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks

Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capab...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2018-01, Vol.6, p.22524-22530
Hauptverfasser: Zazo, Ruben, Sankar Nidadavolu, Phani, Chen, Nanxin, Gonzalez-Rodriguez, Joaquin, Dehak, Najim
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 22530
container_issue
container_start_page 22524
container_title IEEE access
container_volume 6
creator Zazo, Ruben
Sankar Nidadavolu, Phani
Chen, Nanxin
Gonzalez-Rodriguez, Joaquin
Dehak, Najim
description Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.
doi_str_mv 10.1109/ACCESS.2018.2816163
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_8316819</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8316819</ieee_id><doaj_id>oai_doaj_org_article_cef807ed01b6401f8b0ae5794dc5f2d6</doaj_id><sourcerecordid>2455932719</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-8c8ffaec8708997ab0765b0dc1e2085bbb52a9a65a30311329a15c9f88dedc133</originalsourceid><addsrcrecordid>eNpNkV9LwzAUxYsoKOon2EvA5878adLkcY6pg6lo3XNI01vtnM1MUsRvb2ZFzMsJ4XfOveFk2YTgKSFYXc7m80VVTSkmckolEUSwg-yEEqFyxpk4_Hc_zs5D2OB0Eqd4eZI9zl4ALULs3k3sXI-6HlWvzkdU7QDsK1rHCN70FgK6MgEalJhV9XyHnsAO3kMf0T0M3myTxE_n38JZdtSabYDzXz3N1teL5_ltvnq4Wc5nq9wWWMZcWtm2BqwssVSqNDUuBa9xYwlQLHld15waZQQ3DDNCGFWGcKtaKRtIEGOn2XLMbZzZ6J1PP_Bf2plO_zw4_6KNj53dgrbQSlxCg0ktCkxaWWMDvFRFY3lLG5GyLsasnXcfA4SoN27wfVpf04JzxWhJVKLYSFnvQvDQ_k0lWO-r0GMVel-F_q0iuSajqwOAP4dkRMiU-Q02d4P1</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455932719</pqid></control><display><type>article</type><title>Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Zazo, Ruben ; Sankar Nidadavolu, Phani ; Chen, Nanxin ; Gonzalez-Rodriguez, Joaquin ; Dehak, Najim</creator><creatorcontrib>Zazo, Ruben ; Sankar Nidadavolu, Phani ; Chen, Nanxin ; Gonzalez-Rodriguez, Joaquin ; Dehak, Najim</creatorcontrib><description>Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2018.2816163</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Age ; Automatic age estimation ; Chronology ; DNN ; Estimation ; Logic gates ; LSTM ; Neural networks ; NIST ; Real time ; Real-time systems ; Recurrent neural networks ; RNN ; Speech ; Speech recognition ; Time response ; Voice activity detectors ; Voice recognition</subject><ispartof>IEEE access, 2018-01, Vol.6, p.22524-22530</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-8c8ffaec8708997ab0765b0dc1e2085bbb52a9a65a30311329a15c9f88dedc133</citedby><cites>FETCH-LOGICAL-c408t-8c8ffaec8708997ab0765b0dc1e2085bbb52a9a65a30311329a15c9f88dedc133</cites><orcidid>0000-0002-4251-1445</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8316819$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,27610,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Zazo, Ruben</creatorcontrib><creatorcontrib>Sankar Nidadavolu, Phani</creatorcontrib><creatorcontrib>Chen, Nanxin</creatorcontrib><creatorcontrib>Gonzalez-Rodriguez, Joaquin</creatorcontrib><creatorcontrib>Dehak, Najim</creatorcontrib><title>Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks</title><title>IEEE access</title><addtitle>Access</addtitle><description>Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.</description><subject>Age</subject><subject>Automatic age estimation</subject><subject>Chronology</subject><subject>DNN</subject><subject>Estimation</subject><subject>Logic gates</subject><subject>LSTM</subject><subject>Neural networks</subject><subject>NIST</subject><subject>Real time</subject><subject>Real-time systems</subject><subject>Recurrent neural networks</subject><subject>RNN</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Time response</subject><subject>Voice activity detectors</subject><subject>Voice recognition</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkV9LwzAUxYsoKOon2EvA5878adLkcY6pg6lo3XNI01vtnM1MUsRvb2ZFzMsJ4XfOveFk2YTgKSFYXc7m80VVTSkmckolEUSwg-yEEqFyxpk4_Hc_zs5D2OB0Eqd4eZI9zl4ALULs3k3sXI-6HlWvzkdU7QDsK1rHCN70FgK6MgEalJhV9XyHnsAO3kMf0T0M3myTxE_n38JZdtSabYDzXz3N1teL5_ltvnq4Wc5nq9wWWMZcWtm2BqwssVSqNDUuBa9xYwlQLHld15waZQQ3DDNCGFWGcKtaKRtIEGOn2XLMbZzZ6J1PP_Bf2plO_zw4_6KNj53dgrbQSlxCg0ktCkxaWWMDvFRFY3lLG5GyLsasnXcfA4SoN27wfVpf04JzxWhJVKLYSFnvQvDQ_k0lWO-r0GMVel-F_q0iuSajqwOAP4dkRMiU-Q02d4P1</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Zazo, Ruben</creator><creator>Sankar Nidadavolu, Phani</creator><creator>Chen, Nanxin</creator><creator>Gonzalez-Rodriguez, Joaquin</creator><creator>Dehak, Najim</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-4251-1445</orcidid></search><sort><creationdate>20180101</creationdate><title>Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks</title><author>Zazo, Ruben ; Sankar Nidadavolu, Phani ; Chen, Nanxin ; Gonzalez-Rodriguez, Joaquin ; Dehak, Najim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-8c8ffaec8708997ab0765b0dc1e2085bbb52a9a65a30311329a15c9f88dedc133</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Age</topic><topic>Automatic age estimation</topic><topic>Chronology</topic><topic>DNN</topic><topic>Estimation</topic><topic>Logic gates</topic><topic>LSTM</topic><topic>Neural networks</topic><topic>NIST</topic><topic>Real time</topic><topic>Real-time systems</topic><topic>Recurrent neural networks</topic><topic>RNN</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Time response</topic><topic>Voice activity detectors</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zazo, Ruben</creatorcontrib><creatorcontrib>Sankar Nidadavolu, Phani</creatorcontrib><creatorcontrib>Chen, Nanxin</creatorcontrib><creatorcontrib>Gonzalez-Rodriguez, Joaquin</creatorcontrib><creatorcontrib>Dehak, Najim</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zazo, Ruben</au><au>Sankar Nidadavolu, Phani</au><au>Chen, Nanxin</au><au>Gonzalez-Rodriguez, Joaquin</au><au>Dehak, Najim</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2018-01-01</date><risdate>2018</risdate><volume>6</volume><spage>22524</spage><epage>22530</epage><pages>22524-22530</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2018.2816163</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0002-4251-1445</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2018-01, Vol.6, p.22524-22530
issn 2169-3536
2169-3536
language eng
recordid cdi_ieee_primary_8316819
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Age
Automatic age estimation
Chronology
DNN
Estimation
Logic gates
LSTM
Neural networks
NIST
Real time
Real-time systems
Recurrent neural networks
RNN
Speech
Speech recognition
Time response
Voice activity detectors
Voice recognition
title Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T10%3A43%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Age%20Estimation%20in%20Short%20Speech%20Utterances%20Based%20on%20LSTM%20Recurrent%20Neural%20Networks&rft.jtitle=IEEE%20access&rft.au=Zazo,%20Ruben&rft.date=2018-01-01&rft.volume=6&rft.spage=22524&rft.epage=22530&rft.pages=22524-22530&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2018.2816163&rft_dat=%3Cproquest_ieee_%3E2455932719%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2455932719&rft_id=info:pmid/&rft_ieee_id=8316819&rft_doaj_id=oai_doaj_org_article_cef807ed01b6401f8b0ae5794dc5f2d6&rfr_iscdi=true