Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification
In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This i...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Heo, Hee-Soo Nam, KiHyun Lee, Bong-Jin Kwon, Youngki Lee, Minjae Kim, You Jin Chung, Joon Son |
description | In the field of speaker verification, session or channel variability poses a
significant challenge. While many contemporary methods aim to disentangle
session information from speaker embeddings, we introduce a novel approach
using an additional embedding to represent the session information. This is
achieved by training an auxiliary network appended to the speaker embedding
extractor which remains fixed in this training process. This results in two
similarity scores: one for the speakers information and one for the session
information. The latter score acts as a compensator for the former that might
be skewed due to session variations. Our extensive experiments demonstrate that
session information can be effectively compensated without retraining of the
embedding extractor. |
doi_str_mv | 10.48550/arxiv.2309.14741 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2309_14741</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2309_14741</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-17c93b7c5248073ed414e61276a26e50ce2d88b9afe99435e2c24fb6c449816d3</originalsourceid><addsrcrecordid>eNpNj8tKw0AUhmfjQqoP4Mp5gcTMJTMZd1LqBQKFtnQbzkxO6qFpUiax2Lc3VhFXP_-FHz7G7kSW6iLPsweIn3RKpcpcKrTV4pq1KxzfqdtTt-NrHAbqO76FSOCppfH8yEs8YYTd_35x8FjXUzLwpo9_8ar3H8PYTY5Tx9dHhD1GvsVIDQUYp8kNu2qgHfD2V2ds87zYzF-TcvnyNn8qEzBWJMIGp7wNudRFZhXWWmg0QloD0mCeBZR1UXgHDTqnVY4ySN14E7R2hTC1mrH7n9sLbnWMdIB4rr6xqwu2-gLbqlRH</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification</title><source>arXiv.org</source><creator>Heo, Hee-Soo ; Nam, KiHyun ; Lee, Bong-Jin ; Kwon, Youngki ; Lee, Minjae ; Kim, You Jin ; Chung, Joon Son</creator><creatorcontrib>Heo, Hee-Soo ; Nam, KiHyun ; Lee, Bong-Jin ; Kwon, Youngki ; Lee, Minjae ; Kim, You Jin ; Chung, Joon Son</creatorcontrib><description>In the field of speaker verification, session or channel variability poses a
significant challenge. While many contemporary methods aim to disentangle
session information from speaker embeddings, we introduce a novel approach
using an additional embedding to represent the session information. This is
achieved by training an auxiliary network appended to the speaker embedding
extractor which remains fixed in this training process. This results in two
similarity scores: one for the speakers information and one for the session
information. The latter score acts as a compensator for the former that might
be skewed due to session variations. Our extensive experiments demonstrate that
session information can be effectively compensated without retraining of the
embedding extractor.</description><identifier>DOI: 10.48550/arxiv.2309.14741</identifier><language>eng</language><subject>Computer Science - Sound</subject><creationdate>2023-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2309.14741$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2309.14741$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Heo, Hee-Soo</creatorcontrib><creatorcontrib>Nam, KiHyun</creatorcontrib><creatorcontrib>Lee, Bong-Jin</creatorcontrib><creatorcontrib>Kwon, Youngki</creatorcontrib><creatorcontrib>Lee, Minjae</creatorcontrib><creatorcontrib>Kim, You Jin</creatorcontrib><creatorcontrib>Chung, Joon Son</creatorcontrib><title>Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification</title><description>In the field of speaker verification, session or channel variability poses a
significant challenge. While many contemporary methods aim to disentangle
session information from speaker embeddings, we introduce a novel approach
using an additional embedding to represent the session information. This is
achieved by training an auxiliary network appended to the speaker embedding
extractor which remains fixed in this training process. This results in two
similarity scores: one for the speakers information and one for the session
information. The latter score acts as a compensator for the former that might
be skewed due to session variations. Our extensive experiments demonstrate that
session information can be effectively compensated without retraining of the
embedding extractor.</description><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpNj8tKw0AUhmfjQqoP4Mp5gcTMJTMZd1LqBQKFtnQbzkxO6qFpUiax2Lc3VhFXP_-FHz7G7kSW6iLPsweIn3RKpcpcKrTV4pq1KxzfqdtTt-NrHAbqO76FSOCppfH8yEs8YYTd_35x8FjXUzLwpo9_8ar3H8PYTY5Tx9dHhD1GvsVIDQUYp8kNu2qgHfD2V2ds87zYzF-TcvnyNn8qEzBWJMIGp7wNudRFZhXWWmg0QloD0mCeBZR1UXgHDTqnVY4ySN14E7R2hTC1mrH7n9sLbnWMdIB4rr6xqwu2-gLbqlRH</recordid><startdate>20230926</startdate><enddate>20230926</enddate><creator>Heo, Hee-Soo</creator><creator>Nam, KiHyun</creator><creator>Lee, Bong-Jin</creator><creator>Kwon, Youngki</creator><creator>Lee, Minjae</creator><creator>Kim, You Jin</creator><creator>Chung, Joon Son</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230926</creationdate><title>Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification</title><author>Heo, Hee-Soo ; Nam, KiHyun ; Lee, Bong-Jin ; Kwon, Youngki ; Lee, Minjae ; Kim, You Jin ; Chung, Joon Son</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-17c93b7c5248073ed414e61276a26e50ce2d88b9afe99435e2c24fb6c449816d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Heo, Hee-Soo</creatorcontrib><creatorcontrib>Nam, KiHyun</creatorcontrib><creatorcontrib>Lee, Bong-Jin</creatorcontrib><creatorcontrib>Kwon, Youngki</creatorcontrib><creatorcontrib>Lee, Minjae</creatorcontrib><creatorcontrib>Kim, You Jin</creatorcontrib><creatorcontrib>Chung, Joon Son</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Heo, Hee-Soo</au><au>Nam, KiHyun</au><au>Lee, Bong-Jin</au><au>Kwon, Youngki</au><au>Lee, Minjae</au><au>Kim, You Jin</au><au>Chung, Joon Son</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification</atitle><date>2023-09-26</date><risdate>2023</risdate><abstract>In the field of speaker verification, session or channel variability poses a
significant challenge. While many contemporary methods aim to disentangle
session information from speaker embeddings, we introduce a novel approach
using an additional embedding to represent the session information. This is
achieved by training an auxiliary network appended to the speaker embedding
extractor which remains fixed in this training process. This results in two
similarity scores: one for the speakers information and one for the session
information. The latter score acts as a compensator for the former that might
be skewed due to session variations. Our extensive experiments demonstrate that
session information can be effectively compensated without retraining of the
embedding extractor.</abstract><doi>10.48550/arxiv.2309.14741</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2309.14741 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2309_14741 |
source | arXiv.org |
subjects | Computer Science - Sound |
title | Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T22%3A49%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Rethinking%20Session%20Variability:%20Leveraging%20Session%20Embeddings%20for%20Session%20Robustness%20in%20Speaker%20Verification&rft.au=Heo,%20Hee-Soo&rft.date=2023-09-26&rft_id=info:doi/10.48550/arxiv.2309.14741&rft_dat=%3Carxiv_GOX%3E2309_14741%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |