On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer
Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Smart devices serviced by large-scale AI models necessitates user data
transfer to the cloud for inference. For speech applications, this means
transferring private user information, e.g., speaker identity. Our paper
proposes a privacy-enhancing framework that targets speaker identity
anonymization while preserving speech recognition accuracy for our downstream
task~-~Automatic Speech Recognition (ASR). The proposed framework attaches
flexible gradient reversal based speaker adversarial layers to target layers
within an ASR model, where speaker adversarial training anonymizes acoustic
embeddings generated by the targeted layers to remove speaker identity. We
propose on-device deployment by execution of initial layers of the ASR model,
and transmitting anonymized embeddings to the cloud, where the rest of the
model is executed while preserving privacy. Experimental results show that our
method efficiently reduces speaker recognition relative accuracy by 33%, and
improves ASR performance by achieving 6.2% relative Word Error Rate (WER)
reduction. |
---|---|
DOI: | 10.48550/arxiv.2307.13343 |