Speaker anonymization using orthogonal Householder neural network
Speaker anonymization aims to conceal a speaker's identity while preserving content information in speech. Current mainstream neural-network speaker anonymization systems disentangle speech into prosody-related, content, and speaker representations. The speaker representation is then anonymized...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Miao, Xiaoxiao Wang, Xin Cooper, Erica Yamagishi, Junichi Tomashenko, Natalia |
description | Speaker anonymization aims to conceal a speaker's identity while preserving
content information in speech. Current mainstream neural-network speaker
anonymization systems disentangle speech into prosody-related, content, and
speaker representations. The speaker representation is then anonymized by a
selection-based speaker anonymizer that uses a mean vector over a set of
randomly selected speaker vectors from an external pool of English speakers.
However, the resulting anonymized vectors are subject to severe privacy leakage
against powerful attackers, reduction in speaker diversity, and language
mismatch problems for unseen-language speaker anonymization. To generate
diverse, language-neutral speaker vectors, this paper proposes an anonymizer
based on an orthogonal Householder neural network (OHNN). Specifically, the
OHNN acts like a rotation to transform the original speaker vectors into
anonymized speaker vectors, which are constrained to follow the distribution
over the original speaker vector space. A basic classification loss is
introduced to ensure that anonymized speaker vectors from different speakers
have unique speaker identities. To further protect speaker identities, an
improved classification loss and similarity loss are used to push
original-anonymized sample pairs away from each other. Experiments on
VoicePrivacy Challenge datasets in English and the \textit{AISHELL-3} dataset
in Mandarin demonstrate the proposed anonymizer's effectiveness. |
doi_str_mv | 10.48550/arxiv.2305.18823 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2305_18823</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2305_18823</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-92c5cd3b99f57beac1cb20d65ffba0eb216b1f7e2c0f8eb986a7241ab2e543073</originalsourceid><addsrcrecordid>eNotz01uwjAUBGBvWFTQA3TVXCDBP3HsLBGCUgmpi7KPnp1nsAg2chJaevpSymqk0Wikj5AXRotSS0nnkL79peCCyoJpzcUTWXyeEY6YMggxXE_-BwYfQzb2PuyzmIZD3McAXbaJY4-H2LW3acAx3aqAw1dMxxmZOOh6fH7klOzWq91yk28_3t6Xi20OlRJ5za20rTB17aQyCJZZw2lbSecMUDScVYY5hdxSp9HUugLFSwaGoywFVWJKXv9v74bmnPwJ0rX5szR3i_gFhotGPA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Speaker anonymization using orthogonal Householder neural network</title><source>arXiv.org</source><creator>Miao, Xiaoxiao ; Wang, Xin ; Cooper, Erica ; Yamagishi, Junichi ; Tomashenko, Natalia</creator><creatorcontrib>Miao, Xiaoxiao ; Wang, Xin ; Cooper, Erica ; Yamagishi, Junichi ; Tomashenko, Natalia</creatorcontrib><description>Speaker anonymization aims to conceal a speaker's identity while preserving
content information in speech. Current mainstream neural-network speaker
anonymization systems disentangle speech into prosody-related, content, and
speaker representations. The speaker representation is then anonymized by a
selection-based speaker anonymizer that uses a mean vector over a set of
randomly selected speaker vectors from an external pool of English speakers.
However, the resulting anonymized vectors are subject to severe privacy leakage
against powerful attackers, reduction in speaker diversity, and language
mismatch problems for unseen-language speaker anonymization. To generate
diverse, language-neutral speaker vectors, this paper proposes an anonymizer
based on an orthogonal Householder neural network (OHNN). Specifically, the
OHNN acts like a rotation to transform the original speaker vectors into
anonymized speaker vectors, which are constrained to follow the distribution
over the original speaker vector space. A basic classification loss is
introduced to ensure that anonymized speaker vectors from different speakers
have unique speaker identities. To further protect speaker identities, an
improved classification loss and similarity loss are used to push
original-anonymized sample pairs away from each other. Experiments on
VoicePrivacy Challenge datasets in English and the \textit{AISHELL-3} dataset
in Mandarin demonstrate the proposed anonymizer's effectiveness.</description><identifier>DOI: 10.48550/arxiv.2305.18823</identifier><language>eng</language><subject>Computer Science - Sound</subject><creationdate>2023-05</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2305.18823$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.18823$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Miao, Xiaoxiao</creatorcontrib><creatorcontrib>Wang, Xin</creatorcontrib><creatorcontrib>Cooper, Erica</creatorcontrib><creatorcontrib>Yamagishi, Junichi</creatorcontrib><creatorcontrib>Tomashenko, Natalia</creatorcontrib><title>Speaker anonymization using orthogonal Householder neural network</title><description>Speaker anonymization aims to conceal a speaker's identity while preserving
content information in speech. Current mainstream neural-network speaker
anonymization systems disentangle speech into prosody-related, content, and
speaker representations. The speaker representation is then anonymized by a
selection-based speaker anonymizer that uses a mean vector over a set of
randomly selected speaker vectors from an external pool of English speakers.
However, the resulting anonymized vectors are subject to severe privacy leakage
against powerful attackers, reduction in speaker diversity, and language
mismatch problems for unseen-language speaker anonymization. To generate
diverse, language-neutral speaker vectors, this paper proposes an anonymizer
based on an orthogonal Householder neural network (OHNN). Specifically, the
OHNN acts like a rotation to transform the original speaker vectors into
anonymized speaker vectors, which are constrained to follow the distribution
over the original speaker vector space. A basic classification loss is
introduced to ensure that anonymized speaker vectors from different speakers
have unique speaker identities. To further protect speaker identities, an
improved classification loss and similarity loss are used to push
original-anonymized sample pairs away from each other. Experiments on
VoicePrivacy Challenge datasets in English and the \textit{AISHELL-3} dataset
in Mandarin demonstrate the proposed anonymizer's effectiveness.</description><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz01uwjAUBGBvWFTQA3TVXCDBP3HsLBGCUgmpi7KPnp1nsAg2chJaevpSymqk0Wikj5AXRotSS0nnkL79peCCyoJpzcUTWXyeEY6YMggxXE_-BwYfQzb2PuyzmIZD3McAXbaJY4-H2LW3acAx3aqAw1dMxxmZOOh6fH7klOzWq91yk28_3t6Xi20OlRJ5za20rTB17aQyCJZZw2lbSecMUDScVYY5hdxSp9HUugLFSwaGoywFVWJKXv9v74bmnPwJ0rX5szR3i_gFhotGPA</recordid><startdate>20230530</startdate><enddate>20230530</enddate><creator>Miao, Xiaoxiao</creator><creator>Wang, Xin</creator><creator>Cooper, Erica</creator><creator>Yamagishi, Junichi</creator><creator>Tomashenko, Natalia</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230530</creationdate><title>Speaker anonymization using orthogonal Householder neural network</title><author>Miao, Xiaoxiao ; Wang, Xin ; Cooper, Erica ; Yamagishi, Junichi ; Tomashenko, Natalia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-92c5cd3b99f57beac1cb20d65ffba0eb216b1f7e2c0f8eb986a7241ab2e543073</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Miao, Xiaoxiao</creatorcontrib><creatorcontrib>Wang, Xin</creatorcontrib><creatorcontrib>Cooper, Erica</creatorcontrib><creatorcontrib>Yamagishi, Junichi</creatorcontrib><creatorcontrib>Tomashenko, Natalia</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Miao, Xiaoxiao</au><au>Wang, Xin</au><au>Cooper, Erica</au><au>Yamagishi, Junichi</au><au>Tomashenko, Natalia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Speaker anonymization using orthogonal Householder neural network</atitle><date>2023-05-30</date><risdate>2023</risdate><abstract>Speaker anonymization aims to conceal a speaker's identity while preserving
content information in speech. Current mainstream neural-network speaker
anonymization systems disentangle speech into prosody-related, content, and
speaker representations. The speaker representation is then anonymized by a
selection-based speaker anonymizer that uses a mean vector over a set of
randomly selected speaker vectors from an external pool of English speakers.
However, the resulting anonymized vectors are subject to severe privacy leakage
against powerful attackers, reduction in speaker diversity, and language
mismatch problems for unseen-language speaker anonymization. To generate
diverse, language-neutral speaker vectors, this paper proposes an anonymizer
based on an orthogonal Householder neural network (OHNN). Specifically, the
OHNN acts like a rotation to transform the original speaker vectors into
anonymized speaker vectors, which are constrained to follow the distribution
over the original speaker vector space. A basic classification loss is
introduced to ensure that anonymized speaker vectors from different speakers
have unique speaker identities. To further protect speaker identities, an
improved classification loss and similarity loss are used to push
original-anonymized sample pairs away from each other. Experiments on
VoicePrivacy Challenge datasets in English and the \textit{AISHELL-3} dataset
in Mandarin demonstrate the proposed anonymizer's effectiveness.</abstract><doi>10.48550/arxiv.2305.18823</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2305.18823 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2305_18823 |
source | arXiv.org |
subjects | Computer Science - Sound |
title | Speaker anonymization using orthogonal Householder neural network |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-18T23%3A13%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Speaker%20anonymization%20using%20orthogonal%20Householder%20neural%20network&rft.au=Miao,%20Xiaoxiao&rft.date=2023-05-30&rft_id=info:doi/10.48550/arxiv.2305.18823&rft_dat=%3Carxiv_GOX%3E2305_18823%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |