Enhancement of Dysarthric Speech Reconstruction by Contrastive Learning

Dysarthric speech reconstruction is challenging due to its pathological sound patterns. Preserving speaker identity, especially without access to normal speech, is a key challenge. Our proposed approach uses contrastive learning to extract speaker embedding for reconstruction, while employing XLS-R...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Fatemeh, Keshvari, Rahil, Mahdian Toroghi, Hassan, Zareian
Format: Artikel
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Dysarthric speech reconstruction is challenging due to its pathological sound patterns. Preserving speaker identity, especially without access to normal speech, is a key challenge. Our proposed approach uses contrastive learning to extract speaker embedding for reconstruction, while employing XLS-R representations instead of filter banks. The results show improved speech quality, naturalness, intelligibility, speaker identity preservation, and gender consistency for female speakers. Reconstructed speech exhibits 1.51 and 2.12 MOS score improvements and reduces word error rates by 25.45% and 32.1% for moderate and moderate-severe dysarthria speakers using Jasper speech recognition system, respectively. This approach offers promising advancements in dysarthric speech reconstruction.
DOI:10.48550/arxiv.2410.04092