Enhancement of Dysarthric Speech Reconstruction by Contrastive Learning
Dysarthric speech reconstruction is challenging due to its pathological sound patterns. Preserving speaker identity, especially without access to normal speech, is a key challenge. Our proposed approach uses contrastive learning to extract speaker embedding for reconstruction, while employing XLS-R...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Dysarthric speech reconstruction is challenging due to its pathological sound
patterns. Preserving speaker identity, especially without access to normal
speech, is a key challenge. Our proposed approach uses contrastive learning to
extract speaker embedding for reconstruction, while employing XLS-R
representations instead of filter banks. The results show improved speech
quality, naturalness, intelligibility, speaker identity preservation, and
gender consistency for female speakers. Reconstructed speech exhibits 1.51 and
2.12 MOS score improvements and reduces word error rates by 25.45% and 32.1%
for moderate and moderate-severe dysarthria speakers using Jasper speech
recognition system, respectively. This approach offers promising advancements
in dysarthric speech reconstruction. |
---|---|
DOI: | 10.48550/arxiv.2410.04092 |