Deep learning methods in speaker recognition: a review

This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the pas...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2019-11
Hauptverfasser:	Sztahó, Dávid, Szaszák, György, Beke, András
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Computer Science - Sound Deep learning Machine learning Speech recognition Statistics - Machine Learning Verification
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Sztahó, Dávid Szaszák, György Beke, András
description	This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.
doi_str_mv	10.48550/arxiv.1911.06615
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_1911_06615</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2315359212</sourcerecordid><originalsourceid>FETCH-LOGICAL-a522-4c3fdfb8f412a583f6ef4a9eb0e20e58bf1de2803c2249046f544368682a8f303</originalsourceid><addsrcrecordid>eNotj01LAzEYhIMgWGp_gCcDnndN3nyY9SZVq1Dw0nvI7r6pqW12TbZV_71r62kYZhjmIeSKs1IapditS9_hUPKK85JpzdUZmYAQvDAS4ILMct4wxkDfgVJiQvQjYk-36FIMcU13OLx3baYh0tyj-8BEEzbdOoYhdPGeutEeAn5dknPvthln_zolq-en1fylWL4tXucPy8IpgEI2wre-Nl5ycMoIr9FLV2HNEBgqU3veIhgmGgBZMam9klJoow044wUTU3J9mj1C2T6FnUs_9g_OHuHGxs2p0afuc495sJtun-L4yYIYc1UBB_EL-XdQlw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2315359212</pqid></control><display><type>article</type><title>Deep learning methods in speaker recognition: a review</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Sztahó, Dávid ; Szaszák, György ; Beke, András</creator><creatorcontrib>Sztahó, Dávid ; Szaszák, György ; Beke, András</creatorcontrib><description>This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.1911.06615</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Learning ; Computer Science - Sound ; Deep learning ; Machine learning ; Speech recognition ; Statistics - Machine Learning ; Verification</subject><ispartof>arXiv.org, 2019-11</ispartof><rights>2019. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27925</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.1911.06615$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.3311/PPee.17024$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Sztahó, Dávid</creatorcontrib><creatorcontrib>Szaszák, György</creatorcontrib><creatorcontrib>Beke, András</creatorcontrib><title>Deep learning methods in speaker recognition: a review</title><title>arXiv.org</title><description>This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Sound</subject><subject>Deep learning</subject><subject>Machine learning</subject><subject>Speech recognition</subject><subject>Statistics - Machine Learning</subject><subject>Verification</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotj01LAzEYhIMgWGp_gCcDnndN3nyY9SZVq1Dw0nvI7r6pqW12TbZV_71r62kYZhjmIeSKs1IapditS9_hUPKK85JpzdUZmYAQvDAS4ILMct4wxkDfgVJiQvQjYk-36FIMcU13OLx3baYh0tyj-8BEEzbdOoYhdPGeutEeAn5dknPvthln_zolq-en1fylWL4tXucPy8IpgEI2wre-Nl5ycMoIr9FLV2HNEBgqU3veIhgmGgBZMam9klJoow044wUTU3J9mj1C2T6FnUs_9g_OHuHGxs2p0afuc495sJtun-L4yYIYc1UBB_EL-XdQlw</recordid><startdate>20191114</startdate><enddate>20191114</enddate><creator>Sztahó, Dávid</creator><creator>Szaszák, György</creator><creator>Beke, András</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20191114</creationdate><title>Deep learning methods in speaker recognition: a review</title><author>Sztahó, Dávid ; Szaszák, György ; Beke, András</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a522-4c3fdfb8f412a583f6ef4a9eb0e20e58bf1de2803c2249046f544368682a8f303</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Sound</topic><topic>Deep learning</topic><topic>Machine learning</topic><topic>Speech recognition</topic><topic>Statistics - Machine Learning</topic><topic>Verification</topic><toplevel>online_resources</toplevel><creatorcontrib>Sztahó, Dávid</creatorcontrib><creatorcontrib>Szaszák, György</creatorcontrib><creatorcontrib>Beke, András</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sztahó, Dávid</au><au>Szaszák, György</au><au>Beke, András</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep learning methods in speaker recognition: a review</atitle><jtitle>arXiv.org</jtitle><date>2019-11-14</date><risdate>2019</risdate><eissn>2331-8422</eissn><abstract>This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.1911.06615</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2019-11
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_1911_06615
source	arXiv.org; Free E- Journals
subjects	Computer Science - Learning Computer Science - Sound Deep learning Machine learning Speech recognition Statistics - Machine Learning Verification
title	Deep learning methods in speaker recognition: a review
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T12%3A28%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20learning%20methods%20in%20speaker%20recognition:%20a%20review&rft.jtitle=arXiv.org&rft.au=Sztah%C3%B3,%20D%C3%A1vid&rft.date=2019-11-14&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.1911.06615&rft_dat=%3Cproquest_arxiv%3E2315359212%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2315359212&rft_id=info:pmid/&rfr_iscdi=true