Speaker anonymization using orthogonal Householder neural network

Speaker anonymization aims to conceal a speaker's identity while preserving content information in speech. Current mainstream neural-network speaker anonymization systems disentangle speech into prosody-related, content, and speaker representations. The speaker representation is then anonymized...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Miao, Xiaoxiao, Wang, Xin, Cooper, Erica, Yamagishi, Junichi, Tomashenko, Natalia
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Miao, Xiaoxiao Wang, Xin Cooper, Erica Yamagishi, Junichi Tomashenko, Natalia
description	Speaker anonymization aims to conceal a speaker's identity while preserving content information in speech. Current mainstream neural-network speaker anonymization systems disentangle speech into prosody-related, content, and speaker representations. The speaker representation is then anonymized by a selection-based speaker anonymizer that uses a mean vector over a set of randomly selected speaker vectors from an external pool of English speakers. However, the resulting anonymized vectors are subject to severe privacy leakage against powerful attackers, reduction in speaker diversity, and language mismatch problems for unseen-language speaker anonymization. To generate diverse, language-neutral speaker vectors, this paper proposes an anonymizer based on an orthogonal Householder neural network (OHNN). Specifically, the OHNN acts like a rotation to transform the original speaker vectors into anonymized speaker vectors, which are constrained to follow the distribution over the original speaker vector space. A basic classification loss is introduced to ensure that anonymized speaker vectors from different speakers have unique speaker identities. To further protect speaker identities, an improved classification loss and similarity loss are used to push original-anonymized sample pairs away from each other. Experiments on VoicePrivacy Challenge datasets in English and the \textit{AISHELL-3} dataset in Mandarin demonstrate the proposed anonymizer's effectiveness.
doi_str_mv	10.48550/arxiv.2305.18823
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2305_18823</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2305_18823</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-92c5cd3b99f57beac1cb20d65ffba0eb216b1f7e2c0f8eb986a7241ab2e543073</originalsourceid><addsrcrecordid>eNotz01uwjAUBGBvWFTQA3TVXCDBP3HsLBGCUgmpi7KPnp1nsAg2chJaevpSymqk0Wikj5AXRotSS0nnkL79peCCyoJpzcUTWXyeEY6YMggxXE_-BwYfQzb2PuyzmIZD3McAXbaJY4-H2LW3acAx3aqAw1dMxxmZOOh6fH7klOzWq91yk28_3t6Xi20OlRJ5za20rTB17aQyCJZZw2lbSecMUDScVYY5hdxSp9HUugLFSwaGoywFVWJKXv9v74bmnPwJ0rX5szR3i_gFhotGPA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Speaker anonymization using orthogonal Householder neural network</title><source>arXiv.org</source><creator>Miao, Xiaoxiao ; Wang, Xin ; Cooper, Erica ; Yamagishi, Junichi ; Tomashenko, Natalia</creator><creatorcontrib>Miao, Xiaoxiao ; Wang, Xin ; Cooper, Erica ; Yamagishi, Junichi ; Tomashenko, Natalia</creatorcontrib><description>Speaker anonymization aims to conceal a speaker's identity while preserving content information in speech. Current mainstream neural-network speaker anonymization systems disentangle speech into prosody-related, content, and speaker representations. The speaker representation is then anonymized by a selection-based speaker anonymizer that uses a mean vector over a set of randomly selected speaker vectors from an external pool of English speakers. However, the resulting anonymized vectors are subject to severe privacy leakage against powerful attackers, reduction in speaker diversity, and language mismatch problems for unseen-language speaker anonymization. To generate diverse, language-neutral speaker vectors, this paper proposes an anonymizer based on an orthogonal Householder neural network (OHNN). Specifically, the OHNN acts like a rotation to transform the original speaker vectors into anonymized speaker vectors, which are constrained to follow the distribution over the original speaker vector space. A basic classification loss is introduced to ensure that anonymized speaker vectors from different speakers have unique speaker identities. To further protect speaker identities, an improved classification loss and similarity loss are used to push original-anonymized sample pairs away from each other. Experiments on VoicePrivacy Challenge datasets in English and the \textit{AISHELL-3} dataset in Mandarin demonstrate the proposed anonymizer's effectiveness.</description><identifier>DOI: 10.48550/arxiv.2305.18823</identifier><language>eng</language><subject>Computer Science - Sound</subject><creationdate>2023-05</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2305.18823$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.18823$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Miao, Xiaoxiao</creatorcontrib><creatorcontrib>Wang, Xin</creatorcontrib><creatorcontrib>Cooper, Erica</creatorcontrib><creatorcontrib>Yamagishi, Junichi</creatorcontrib><creatorcontrib>Tomashenko, Natalia</creatorcontrib><title>Speaker anonymization using orthogonal Householder neural network</title><description>Speaker anonymization aims to conceal a speaker's identity while preserving content information in speech. Current mainstream neural-network speaker anonymization systems disentangle speech into prosody-related, content, and speaker representations. The speaker representation is then anonymized by a selection-based speaker anonymizer that uses a mean vector over a set of randomly selected speaker vectors from an external pool of English speakers. However, the resulting anonymized vectors are subject to severe privacy leakage against powerful attackers, reduction in speaker diversity, and language mismatch problems for unseen-language speaker anonymization. To generate diverse, language-neutral speaker vectors, this paper proposes an anonymizer based on an orthogonal Householder neural network (OHNN). Specifically, the OHNN acts like a rotation to transform the original speaker vectors into anonymized speaker vectors, which are constrained to follow the distribution over the original speaker vector space. A basic classification loss is introduced to ensure that anonymized speaker vectors from different speakers have unique speaker identities. To further protect speaker identities, an improved classification loss and similarity loss are used to push original-anonymized sample pairs away from each other. Experiments on VoicePrivacy Challenge datasets in English and the \textit{AISHELL-3} dataset in Mandarin demonstrate the proposed anonymizer's effectiveness.</description><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz01uwjAUBGBvWFTQA3TVXCDBP3HsLBGCUgmpi7KPnp1nsAg2chJaevpSymqk0Wikj5AXRotSS0nnkL79peCCyoJpzcUTWXyeEY6YMggxXE_-BwYfQzb2PuyzmIZD3McAXbaJY4-H2LW3acAx3aqAw1dMxxmZOOh6fH7klOzWq91yk28_3t6Xi20OlRJ5za20rTB17aQyCJZZw2lbSecMUDScVYY5hdxSp9HUugLFSwaGoywFVWJKXv9v74bmnPwJ0rX5szR3i_gFhotGPA</recordid><startdate>20230530</startdate><enddate>20230530</enddate><creator>Miao, Xiaoxiao</creator><creator>Wang, Xin</creator><creator>Cooper, Erica</creator><creator>Yamagishi, Junichi</creator><creator>Tomashenko, Natalia</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230530</creationdate><title>Speaker anonymization using orthogonal Householder neural network</title><author>Miao, Xiaoxiao ; Wang, Xin ; Cooper, Erica ; Yamagishi, Junichi ; Tomashenko, Natalia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-92c5cd3b99f57beac1cb20d65ffba0eb216b1f7e2c0f8eb986a7241ab2e543073</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Miao, Xiaoxiao</creatorcontrib><creatorcontrib>Wang, Xin</creatorcontrib><creatorcontrib>Cooper, Erica</creatorcontrib><creatorcontrib>Yamagishi, Junichi</creatorcontrib><creatorcontrib>Tomashenko, Natalia</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Miao, Xiaoxiao</au><au>Wang, Xin</au><au>Cooper, Erica</au><au>Yamagishi, Junichi</au><au>Tomashenko, Natalia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Speaker anonymization using orthogonal Householder neural network</atitle><date>2023-05-30</date><risdate>2023</risdate><abstract>Speaker anonymization aims to conceal a speaker's identity while preserving content information in speech. Current mainstream neural-network speaker anonymization systems disentangle speech into prosody-related, content, and speaker representations. The speaker representation is then anonymized by a selection-based speaker anonymizer that uses a mean vector over a set of randomly selected speaker vectors from an external pool of English speakers. However, the resulting anonymized vectors are subject to severe privacy leakage against powerful attackers, reduction in speaker diversity, and language mismatch problems for unseen-language speaker anonymization. To generate diverse, language-neutral speaker vectors, this paper proposes an anonymizer based on an orthogonal Householder neural network (OHNN). Specifically, the OHNN acts like a rotation to transform the original speaker vectors into anonymized speaker vectors, which are constrained to follow the distribution over the original speaker vector space. A basic classification loss is introduced to ensure that anonymized speaker vectors from different speakers have unique speaker identities. To further protect speaker identities, an improved classification loss and similarity loss are used to push original-anonymized sample pairs away from each other. Experiments on VoicePrivacy Challenge datasets in English and the \textit{AISHELL-3} dataset in Mandarin demonstrate the proposed anonymizer's effectiveness.</abstract><doi>10.48550/arxiv.2305.18823</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2305.18823
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2305_18823
source	arXiv.org
subjects	Computer Science - Sound
title	Speaker anonymization using orthogonal Householder neural network
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-18T23%3A13%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Speaker%20anonymization%20using%20orthogonal%20Householder%20neural%20network&rft.au=Miao,%20Xiaoxiao&rft.date=2023-05-30&rft_id=info:doi/10.48550/arxiv.2305.18823&rft_dat=%3Carxiv_GOX%3E2305_18823%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true