Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Heo, Hee-Soo, Nam, KiHyun, Lee, Bong-Jin, Kwon, Youngki, Lee, Minjae, Kim, You Jin, Chung, Joon Son
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Heo, Hee-Soo Nam, KiHyun Lee, Bong-Jin Kwon, Youngki Lee, Minjae Kim, You Jin Chung, Joon Son
description	In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remains fixed in this training process. This results in two similarity scores: one for the speakers information and one for the session information. The latter score acts as a compensator for the former that might be skewed due to session variations. Our extensive experiments demonstrate that session information can be effectively compensated without retraining of the embedding extractor.
doi_str_mv	10.48550/arxiv.2309.14741
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2309_14741</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2309_14741</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-17c93b7c5248073ed414e61276a26e50ce2d88b9afe99435e2c24fb6c449816d3</originalsourceid><addsrcrecordid>eNpNj8tKw0AUhmfjQqoP4Mp5gcTMJTMZd1LqBQKFtnQbzkxO6qFpUiax2Lc3VhFXP_-FHz7G7kSW6iLPsweIn3RKpcpcKrTV4pq1KxzfqdtTt-NrHAbqO76FSOCppfH8yEs8YYTd_35x8FjXUzLwpo9_8ar3H8PYTY5Tx9dHhD1GvsVIDQUYp8kNu2qgHfD2V2ds87zYzF-TcvnyNn8qEzBWJMIGp7wNudRFZhXWWmg0QloD0mCeBZR1UXgHDTqnVY4ySN14E7R2hTC1mrH7n9sLbnWMdIB4rr6xqwu2-gLbqlRH</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification</title><source>arXiv.org</source><creator>Heo, Hee-Soo ; Nam, KiHyun ; Lee, Bong-Jin ; Kwon, Youngki ; Lee, Minjae ; Kim, You Jin ; Chung, Joon Son</creator><creatorcontrib>Heo, Hee-Soo ; Nam, KiHyun ; Lee, Bong-Jin ; Kwon, Youngki ; Lee, Minjae ; Kim, You Jin ; Chung, Joon Son</creatorcontrib><description>In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remains fixed in this training process. This results in two similarity scores: one for the speakers information and one for the session information. The latter score acts as a compensator for the former that might be skewed due to session variations. Our extensive experiments demonstrate that session information can be effectively compensated without retraining of the embedding extractor.</description><identifier>DOI: 10.48550/arxiv.2309.14741</identifier><language>eng</language><subject>Computer Science - Sound</subject><creationdate>2023-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2309.14741$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2309.14741$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Heo, Hee-Soo</creatorcontrib><creatorcontrib>Nam, KiHyun</creatorcontrib><creatorcontrib>Lee, Bong-Jin</creatorcontrib><creatorcontrib>Kwon, Youngki</creatorcontrib><creatorcontrib>Lee, Minjae</creatorcontrib><creatorcontrib>Kim, You Jin</creatorcontrib><creatorcontrib>Chung, Joon Son</creatorcontrib><title>Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification</title><description>In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remains fixed in this training process. This results in two similarity scores: one for the speakers information and one for the session information. The latter score acts as a compensator for the former that might be skewed due to session variations. Our extensive experiments demonstrate that session information can be effectively compensated without retraining of the embedding extractor.</description><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpNj8tKw0AUhmfjQqoP4Mp5gcTMJTMZd1LqBQKFtnQbzkxO6qFpUiax2Lc3VhFXP_-FHz7G7kSW6iLPsweIn3RKpcpcKrTV4pq1KxzfqdtTt-NrHAbqO76FSOCppfH8yEs8YYTd_35x8FjXUzLwpo9_8ar3H8PYTY5Tx9dHhD1GvsVIDQUYp8kNu2qgHfD2V2ds87zYzF-TcvnyNn8qEzBWJMIGp7wNudRFZhXWWmg0QloD0mCeBZR1UXgHDTqnVY4ySN14E7R2hTC1mrH7n9sLbnWMdIB4rr6xqwu2-gLbqlRH</recordid><startdate>20230926</startdate><enddate>20230926</enddate><creator>Heo, Hee-Soo</creator><creator>Nam, KiHyun</creator><creator>Lee, Bong-Jin</creator><creator>Kwon, Youngki</creator><creator>Lee, Minjae</creator><creator>Kim, You Jin</creator><creator>Chung, Joon Son</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230926</creationdate><title>Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification</title><author>Heo, Hee-Soo ; Nam, KiHyun ; Lee, Bong-Jin ; Kwon, Youngki ; Lee, Minjae ; Kim, You Jin ; Chung, Joon Son</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-17c93b7c5248073ed414e61276a26e50ce2d88b9afe99435e2c24fb6c449816d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Heo, Hee-Soo</creatorcontrib><creatorcontrib>Nam, KiHyun</creatorcontrib><creatorcontrib>Lee, Bong-Jin</creatorcontrib><creatorcontrib>Kwon, Youngki</creatorcontrib><creatorcontrib>Lee, Minjae</creatorcontrib><creatorcontrib>Kim, You Jin</creatorcontrib><creatorcontrib>Chung, Joon Son</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Heo, Hee-Soo</au><au>Nam, KiHyun</au><au>Lee, Bong-Jin</au><au>Kwon, Youngki</au><au>Lee, Minjae</au><au>Kim, You Jin</au><au>Chung, Joon Son</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification</atitle><date>2023-09-26</date><risdate>2023</risdate><abstract>In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remains fixed in this training process. This results in two similarity scores: one for the speakers information and one for the session information. The latter score acts as a compensator for the former that might be skewed due to session variations. Our extensive experiments demonstrate that session information can be effectively compensated without retraining of the embedding extractor.</abstract><doi>10.48550/arxiv.2309.14741</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2309.14741
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2309_14741
source	arXiv.org
subjects	Computer Science - Sound
title	Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T22%3A49%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Rethinking%20Session%20Variability:%20Leveraging%20Session%20Embeddings%20for%20Session%20Robustness%20in%20Speaker%20Verification&rft.au=Heo,%20Hee-Soo&rft.date=2023-09-26&rft_id=info:doi/10.48550/arxiv.2309.14741&rft_dat=%3Carxiv_GOX%3E2309_14741%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true