Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Heo, Hee-Soo, Nam, KiHyun, Lee, Bong-Jin, Kwon, Youngki, Lee, Minjae, Kim, You Jin, Chung, Joon Son
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Heo, Hee-Soo
Nam, KiHyun
Lee, Bong-Jin
Kwon, Youngki
Lee, Minjae
Kim, You Jin
Chung, Joon Son
description In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remains fixed in this training process. This results in two similarity scores: one for the speakers information and one for the session information. The latter score acts as a compensator for the former that might be skewed due to session variations. Our extensive experiments demonstrate that session information can be effectively compensated without retraining of the embedding extractor.
doi_str_mv 10.48550/arxiv.2309.14741
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2309_14741</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2309_14741</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-17c93b7c5248073ed414e61276a26e50ce2d88b9afe99435e2c24fb6c449816d3</originalsourceid><addsrcrecordid>eNpNj8tKw0AUhmfjQqoP4Mp5gcTMJTMZd1LqBQKFtnQbzkxO6qFpUiax2Lc3VhFXP_-FHz7G7kSW6iLPsweIn3RKpcpcKrTV4pq1KxzfqdtTt-NrHAbqO76FSOCppfH8yEs8YYTd_35x8FjXUzLwpo9_8ar3H8PYTY5Tx9dHhD1GvsVIDQUYp8kNu2qgHfD2V2ds87zYzF-TcvnyNn8qEzBWJMIGp7wNudRFZhXWWmg0QloD0mCeBZR1UXgHDTqnVY4ySN14E7R2hTC1mrH7n9sLbnWMdIB4rr6xqwu2-gLbqlRH</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification</title><source>arXiv.org</source><creator>Heo, Hee-Soo ; Nam, KiHyun ; Lee, Bong-Jin ; Kwon, Youngki ; Lee, Minjae ; Kim, You Jin ; Chung, Joon Son</creator><creatorcontrib>Heo, Hee-Soo ; Nam, KiHyun ; Lee, Bong-Jin ; Kwon, Youngki ; Lee, Minjae ; Kim, You Jin ; Chung, Joon Son</creatorcontrib><description>In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remains fixed in this training process. This results in two similarity scores: one for the speakers information and one for the session information. The latter score acts as a compensator for the former that might be skewed due to session variations. Our extensive experiments demonstrate that session information can be effectively compensated without retraining of the embedding extractor.</description><identifier>DOI: 10.48550/arxiv.2309.14741</identifier><language>eng</language><subject>Computer Science - Sound</subject><creationdate>2023-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2309.14741$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2309.14741$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Heo, Hee-Soo</creatorcontrib><creatorcontrib>Nam, KiHyun</creatorcontrib><creatorcontrib>Lee, Bong-Jin</creatorcontrib><creatorcontrib>Kwon, Youngki</creatorcontrib><creatorcontrib>Lee, Minjae</creatorcontrib><creatorcontrib>Kim, You Jin</creatorcontrib><creatorcontrib>Chung, Joon Son</creatorcontrib><title>Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification</title><description>In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remains fixed in this training process. This results in two similarity scores: one for the speakers information and one for the session information. The latter score acts as a compensator for the former that might be skewed due to session variations. Our extensive experiments demonstrate that session information can be effectively compensated without retraining of the embedding extractor.</description><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpNj8tKw0AUhmfjQqoP4Mp5gcTMJTMZd1LqBQKFtnQbzkxO6qFpUiax2Lc3VhFXP_-FHz7G7kSW6iLPsweIn3RKpcpcKrTV4pq1KxzfqdtTt-NrHAbqO76FSOCppfH8yEs8YYTd_35x8FjXUzLwpo9_8ar3H8PYTY5Tx9dHhD1GvsVIDQUYp8kNu2qgHfD2V2ds87zYzF-TcvnyNn8qEzBWJMIGp7wNudRFZhXWWmg0QloD0mCeBZR1UXgHDTqnVY4ySN14E7R2hTC1mrH7n9sLbnWMdIB4rr6xqwu2-gLbqlRH</recordid><startdate>20230926</startdate><enddate>20230926</enddate><creator>Heo, Hee-Soo</creator><creator>Nam, KiHyun</creator><creator>Lee, Bong-Jin</creator><creator>Kwon, Youngki</creator><creator>Lee, Minjae</creator><creator>Kim, You Jin</creator><creator>Chung, Joon Son</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230926</creationdate><title>Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification</title><author>Heo, Hee-Soo ; Nam, KiHyun ; Lee, Bong-Jin ; Kwon, Youngki ; Lee, Minjae ; Kim, You Jin ; Chung, Joon Son</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-17c93b7c5248073ed414e61276a26e50ce2d88b9afe99435e2c24fb6c449816d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Heo, Hee-Soo</creatorcontrib><creatorcontrib>Nam, KiHyun</creatorcontrib><creatorcontrib>Lee, Bong-Jin</creatorcontrib><creatorcontrib>Kwon, Youngki</creatorcontrib><creatorcontrib>Lee, Minjae</creatorcontrib><creatorcontrib>Kim, You Jin</creatorcontrib><creatorcontrib>Chung, Joon Son</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Heo, Hee-Soo</au><au>Nam, KiHyun</au><au>Lee, Bong-Jin</au><au>Kwon, Youngki</au><au>Lee, Minjae</au><au>Kim, You Jin</au><au>Chung, Joon Son</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification</atitle><date>2023-09-26</date><risdate>2023</risdate><abstract>In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remains fixed in this training process. This results in two similarity scores: one for the speakers information and one for the session information. The latter score acts as a compensator for the former that might be skewed due to session variations. Our extensive experiments demonstrate that session information can be effectively compensated without retraining of the embedding extractor.</abstract><doi>10.48550/arxiv.2309.14741</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2309.14741
ispartof
issn
language eng
recordid cdi_arxiv_primary_2309_14741
source arXiv.org
subjects Computer Science - Sound
title Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T22%3A49%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Rethinking%20Session%20Variability:%20Leveraging%20Session%20Embeddings%20for%20Session%20Robustness%20in%20Speaker%20Verification&rft.au=Heo,%20Hee-Soo&rft.date=2023-09-26&rft_id=info:doi/10.48550/arxiv.2309.14741&rft_dat=%3Carxiv_GOX%3E2309_14741%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true