Lightweight and irreversible speech pseudonymization based on data-driven optimization of cascaded voice modification modules

In this paper, we propose a speech pseudonymization framework that utilizes cascaded and superposition-based voice modification modules. With increasing opportunities to use spoken dialogue systems nowadays, research regarding protecting the privacy of speaker information encapsulated in speech data...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer speech & language 2022-03, Vol.72, p.101315, Article 101315
Hauptverfasser: Kai, Hiroto, Takamichi, Shinnosuke, Shiota, Sayaka, Kiya, Hitoshi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, we propose a speech pseudonymization framework that utilizes cascaded and superposition-based voice modification modules. With increasing opportunities to use spoken dialogue systems nowadays, research regarding protecting the privacy of speaker information encapsulated in speech data is attracting attention. Pseudonymization, which is one method for voice privacy protection, aims to keep the intelligibility of speech while simultaneously suppressing speaker-specific information. One motivation of our framework is to achieve a reliable pseudonymization performance with light computation. To do this, we utilize the advantages of both machine learning-based and signal processing-based approaches. The advantages are (1) using signal processing-based methods parameterized with few hyperparameters and (2) using machine learning-based optimization to optimize all hyperparameters on the basis of black-box systems consisting of automatic speaker verification and automatic speech recognition. Our method of cascading signal processing modules, which are jointly optimized in a data-driven manner, can pseudonymize speech in a lightweight manner. Additionally, we discuss irreversible pseudonymization approaches and propose a superposition approach, yet another pseudonymization method that is more irreversible than the cascade method in terms of estimating the adequate parameters to recover the original signal. From the experimental results conducted under the VoicePrivacy 2020 protocols, we can demonstrate that (1) our cascade method succeeds in deteriorating the speaker recognition rate by over 24% while simultaneously improving the speech recognition rate by approximately 8% compared with a signal processing-based baseline system of VoicePrivacy 2020 and that (2) our superposition method works comparable to our cascade method in terms of pseudonymization performance. [Display omitted] •Lightweight and irreversible speech pseudonymization for protecting voice privacy.•Cascade or superposition of signal processing-based voice modification modules.•Parameter optimization of hyperparameters in a machine learning-based manner.•Advantages of signal processing and machine learning lead to effective performance.•The importance of irreversibility along with the proposal of an irreversible method.
ISSN:0885-2308
1095-8363
DOI:10.1016/j.csl.2021.101315