Deep learning methods in speaker recognition: a review

This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the pas...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2019-11
Hauptverfasser: Sztahó, Dávid, Szaszák, György, Beke, András
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Sztahó, Dávid
Szaszák, György
Beke, András
description This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.
doi_str_mv 10.48550/arxiv.1911.06615
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_1911_06615</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2315359212</sourcerecordid><originalsourceid>FETCH-LOGICAL-a522-4c3fdfb8f412a583f6ef4a9eb0e20e58bf1de2803c2249046f544368682a8f303</originalsourceid><addsrcrecordid>eNotj01LAzEYhIMgWGp_gCcDnndN3nyY9SZVq1Dw0nvI7r6pqW12TbZV_71r62kYZhjmIeSKs1IapditS9_hUPKK85JpzdUZmYAQvDAS4ILMct4wxkDfgVJiQvQjYk-36FIMcU13OLx3baYh0tyj-8BEEzbdOoYhdPGeutEeAn5dknPvthln_zolq-en1fylWL4tXucPy8IpgEI2wre-Nl5ycMoIr9FLV2HNEBgqU3veIhgmGgBZMam9klJoow044wUTU3J9mj1C2T6FnUs_9g_OHuHGxs2p0afuc495sJtun-L4yYIYc1UBB_EL-XdQlw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2315359212</pqid></control><display><type>article</type><title>Deep learning methods in speaker recognition: a review</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Sztahó, Dávid ; Szaszák, György ; Beke, András</creator><creatorcontrib>Sztahó, Dávid ; Szaszák, György ; Beke, András</creatorcontrib><description>This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.1911.06615</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Learning ; Computer Science - Sound ; Deep learning ; Machine learning ; Speech recognition ; Statistics - Machine Learning ; Verification</subject><ispartof>arXiv.org, 2019-11</ispartof><rights>2019. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27925</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.1911.06615$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.3311/PPee.17024$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Sztahó, Dávid</creatorcontrib><creatorcontrib>Szaszák, György</creatorcontrib><creatorcontrib>Beke, András</creatorcontrib><title>Deep learning methods in speaker recognition: a review</title><title>arXiv.org</title><description>This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Sound</subject><subject>Deep learning</subject><subject>Machine learning</subject><subject>Speech recognition</subject><subject>Statistics - Machine Learning</subject><subject>Verification</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotj01LAzEYhIMgWGp_gCcDnndN3nyY9SZVq1Dw0nvI7r6pqW12TbZV_71r62kYZhjmIeSKs1IapditS9_hUPKK85JpzdUZmYAQvDAS4ILMct4wxkDfgVJiQvQjYk-36FIMcU13OLx3baYh0tyj-8BEEzbdOoYhdPGeutEeAn5dknPvthln_zolq-en1fylWL4tXucPy8IpgEI2wre-Nl5ycMoIr9FLV2HNEBgqU3veIhgmGgBZMam9klJoow044wUTU3J9mj1C2T6FnUs_9g_OHuHGxs2p0afuc495sJtun-L4yYIYc1UBB_EL-XdQlw</recordid><startdate>20191114</startdate><enddate>20191114</enddate><creator>Sztahó, Dávid</creator><creator>Szaszák, György</creator><creator>Beke, András</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20191114</creationdate><title>Deep learning methods in speaker recognition: a review</title><author>Sztahó, Dávid ; Szaszák, György ; Beke, András</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a522-4c3fdfb8f412a583f6ef4a9eb0e20e58bf1de2803c2249046f544368682a8f303</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Sound</topic><topic>Deep learning</topic><topic>Machine learning</topic><topic>Speech recognition</topic><topic>Statistics - Machine Learning</topic><topic>Verification</topic><toplevel>online_resources</toplevel><creatorcontrib>Sztahó, Dávid</creatorcontrib><creatorcontrib>Szaszák, György</creatorcontrib><creatorcontrib>Beke, András</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sztahó, Dávid</au><au>Szaszák, György</au><au>Beke, András</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep learning methods in speaker recognition: a review</atitle><jtitle>arXiv.org</jtitle><date>2019-11-14</date><risdate>2019</risdate><eissn>2331-8422</eissn><abstract>This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.1911.06615</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2019-11
issn 2331-8422
language eng
recordid cdi_arxiv_primary_1911_06615
source arXiv.org; Free E- Journals
subjects Computer Science - Learning
Computer Science - Sound
Deep learning
Machine learning
Speech recognition
Statistics - Machine Learning
Verification
title Deep learning methods in speaker recognition: a review
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T12%3A28%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20learning%20methods%20in%20speaker%20recognition:%20a%20review&rft.jtitle=arXiv.org&rft.au=Sztah%C3%B3,%20D%C3%A1vid&rft.date=2019-11-14&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.1911.06615&rft_dat=%3Cproquest_arxiv%3E2315359212%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2315359212&rft_id=info:pmid/&rfr_iscdi=true