Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks

Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capab...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2018-01, Vol.6, p.22524-22530
Hauptverfasser:	Zazo, Ruben, Sankar Nidadavolu, Phani, Chen, Nanxin, Gonzalez-Rodriguez, Joaquin, Dehak, Najim
Format:	Artikel
Sprache:	eng
Schlagworte:	Age Automatic age estimation Chronology DNN Estimation Logic gates LSTM Neural networks NIST Real time Real-time systems Recurrent neural networks RNN Speech Speech recognition Time response Voice activity detectors Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	22530
container_issue
container_start_page	22524
container_title	IEEE access
container_volume	6
creator	Zazo, Ruben Sankar Nidadavolu, Phani Chen, Nanxin Gonzalez-Rodriguez, Joaquin Dehak, Najim
description	Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.
doi_str_mv	10.1109/ACCESS.2018.2816163
format	Article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_8316819</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8316819</ieee_id><doaj_id>oai_doaj_org_article_cef807ed01b6401f8b0ae5794dc5f2d6</doaj_id><sourcerecordid>2455932719</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-8c8ffaec8708997ab0765b0dc1e2085bbb52a9a65a30311329a15c9f88dedc133</originalsourceid><addsrcrecordid>eNpNkV9LwzAUxYsoKOon2EvA5878adLkcY6pg6lo3XNI01vtnM1MUsRvb2ZFzMsJ4XfOveFk2YTgKSFYXc7m80VVTSkmckolEUSwg-yEEqFyxpk4_Hc_zs5D2OB0Eqd4eZI9zl4ALULs3k3sXI-6HlWvzkdU7QDsK1rHCN70FgK6MgEalJhV9XyHnsAO3kMf0T0M3myTxE_n38JZdtSabYDzXz3N1teL5_ltvnq4Wc5nq9wWWMZcWtm2BqwssVSqNDUuBa9xYwlQLHld15waZQQ3DDNCGFWGcKtaKRtIEGOn2XLMbZzZ6J1PP_Bf2plO_zw4_6KNj53dgrbQSlxCg0ktCkxaWWMDvFRFY3lLG5GyLsasnXcfA4SoN27wfVpf04JzxWhJVKLYSFnvQvDQ_k0lWO-r0GMVel-F_q0iuSajqwOAP4dkRMiU-Q02d4P1</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455932719</pqid></control><display><type>article</type><title>Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Zazo, Ruben ; Sankar Nidadavolu, Phani ; Chen, Nanxin ; Gonzalez-Rodriguez, Joaquin ; Dehak, Najim</creator><creatorcontrib>Zazo, Ruben ; Sankar Nidadavolu, Phani ; Chen, Nanxin ; Gonzalez-Rodriguez, Joaquin ; Dehak, Najim</creatorcontrib><description>Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2018.2816163</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Age ; Automatic age estimation ; Chronology ; DNN ; Estimation ; Logic gates ; LSTM ; Neural networks ; NIST ; Real time ; Real-time systems ; Recurrent neural networks ; RNN ; Speech ; Speech recognition ; Time response ; Voice activity detectors ; Voice recognition</subject><ispartof>IEEE access, 2018-01, Vol.6, p.22524-22530</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-8c8ffaec8708997ab0765b0dc1e2085bbb52a9a65a30311329a15c9f88dedc133</citedby><cites>FETCH-LOGICAL-c408t-8c8ffaec8708997ab0765b0dc1e2085bbb52a9a65a30311329a15c9f88dedc133</cites><orcidid>0000-0002-4251-1445</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8316819$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,27610,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Zazo, Ruben</creatorcontrib><creatorcontrib>Sankar Nidadavolu, Phani</creatorcontrib><creatorcontrib>Chen, Nanxin</creatorcontrib><creatorcontrib>Gonzalez-Rodriguez, Joaquin</creatorcontrib><creatorcontrib>Dehak, Najim</creatorcontrib><title>Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks</title><title>IEEE access</title><addtitle>Access</addtitle><description>Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.</description><subject>Age</subject><subject>Automatic age estimation</subject><subject>Chronology</subject><subject>DNN</subject><subject>Estimation</subject><subject>Logic gates</subject><subject>LSTM</subject><subject>Neural networks</subject><subject>NIST</subject><subject>Real time</subject><subject>Real-time systems</subject><subject>Recurrent neural networks</subject><subject>RNN</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Time response</subject><subject>Voice activity detectors</subject><subject>Voice recognition</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkV9LwzAUxYsoKOon2EvA5878adLkcY6pg6lo3XNI01vtnM1MUsRvb2ZFzMsJ4XfOveFk2YTgKSFYXc7m80VVTSkmckolEUSwg-yEEqFyxpk4_Hc_zs5D2OB0Eqd4eZI9zl4ALULs3k3sXI-6HlWvzkdU7QDsK1rHCN70FgK6MgEalJhV9XyHnsAO3kMf0T0M3myTxE_n38JZdtSabYDzXz3N1teL5_ltvnq4Wc5nq9wWWMZcWtm2BqwssVSqNDUuBa9xYwlQLHld15waZQQ3DDNCGFWGcKtaKRtIEGOn2XLMbZzZ6J1PP_Bf2plO_zw4_6KNj53dgrbQSlxCg0ktCkxaWWMDvFRFY3lLG5GyLsasnXcfA4SoN27wfVpf04JzxWhJVKLYSFnvQvDQ_k0lWO-r0GMVel-F_q0iuSajqwOAP4dkRMiU-Q02d4P1</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Zazo, Ruben</creator><creator>Sankar Nidadavolu, Phani</creator><creator>Chen, Nanxin</creator><creator>Gonzalez-Rodriguez, Joaquin</creator><creator>Dehak, Najim</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-4251-1445</orcidid></search><sort><creationdate>20180101</creationdate><title>Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks</title><author>Zazo, Ruben ; Sankar Nidadavolu, Phani ; Chen, Nanxin ; Gonzalez-Rodriguez, Joaquin ; Dehak, Najim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-8c8ffaec8708997ab0765b0dc1e2085bbb52a9a65a30311329a15c9f88dedc133</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Age</topic><topic>Automatic age estimation</topic><topic>Chronology</topic><topic>DNN</topic><topic>Estimation</topic><topic>Logic gates</topic><topic>LSTM</topic><topic>Neural networks</topic><topic>NIST</topic><topic>Real time</topic><topic>Real-time systems</topic><topic>Recurrent neural networks</topic><topic>RNN</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Time response</topic><topic>Voice activity detectors</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zazo, Ruben</creatorcontrib><creatorcontrib>Sankar Nidadavolu, Phani</creatorcontrib><creatorcontrib>Chen, Nanxin</creatorcontrib><creatorcontrib>Gonzalez-Rodriguez, Joaquin</creatorcontrib><creatorcontrib>Dehak, Najim</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zazo, Ruben</au><au>Sankar Nidadavolu, Phani</au><au>Chen, Nanxin</au><au>Gonzalez-Rodriguez, Joaquin</au><au>Dehak, Najim</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2018-01-01</date><risdate>2018</risdate><volume>6</volume><spage>22524</spage><epage>22530</epage><pages>22524-22530</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2018.2816163</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0002-4251-1445</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2018-01, Vol.6, p.22524-22530
issn	2169-3536 2169-3536
language	eng
recordid	cdi_ieee_primary_8316819
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Age Automatic age estimation Chronology DNN Estimation Logic gates LSTM Neural networks NIST Real time Real-time systems Recurrent neural networks RNN Speech Speech recognition Time response Voice activity detectors Voice recognition
title	Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T10%3A43%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Age%20Estimation%20in%20Short%20Speech%20Utterances%20Based%20on%20LSTM%20Recurrent%20Neural%20Networks&rft.jtitle=IEEE%20access&rft.au=Zazo,%20Ruben&rft.date=2018-01-01&rft.volume=6&rft.spage=22524&rft.epage=22530&rft.pages=22524-22530&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2018.2816163&rft_dat=%3Cproquest_ieee_%3E2455932719%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2455932719&rft_id=info:pmid/&rft_ieee_id=8316819&rft_doaj_id=oai_doaj_org_article_cef807ed01b6401f8b0ae5794dc5f2d6&rfr_iscdi=true