Jointly Trained Conversion Model With LPCNet for Any-to-One Voice Conversion Using Speaker-Independent Linguistic Features

We propose a joint training scheme of an any-to-one voice conversion (VC) system with LPCNet to improve the speech naturalness, speaker similarity, and intelligibility of the converted speech. Recent advancements in neural-based vocoders, such as LPCNet, have enabled the production of more natural a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2022, Vol.10, p.134029-134037
Hauptverfasser: Himawan, Ivan, Wang, Ruizhe, Sridharan, Sridha, Fookes, Clinton
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We propose a joint training scheme of an any-to-one voice conversion (VC) system with LPCNet to improve the speech naturalness, speaker similarity, and intelligibility of the converted speech. Recent advancements in neural-based vocoders, such as LPCNet, have enabled the production of more natural and clear speech. However, other components in typical VC systems are often designed independently, such as the conversion model. Hence, separate training strategies are used for each component that is not in direct correlation to the training objective of the vocoder preventing exploitation of the full potential of LPCNet. This problem is addressed by proposing a jointly trained conversion model and LPCNet. To accurately capture the linguistic contents of the given utterance, we use speaker-independent (SI) features derived from an automatic speech recognition (ASR) model trained using a mixed-language speech corpus. Subsequently, a conversion model maps the SI features to the acoustic representations used as input features to LPCNet. The possibility to synthesize cross-language speech using the proposed approach is also explored in this paper. Experimental results show that the proposed model can achieve real-time VC, unlocking the full potential of LPCNet and outperforming the state of the art.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2022.3226350