Tibetan speech synthesis based on an improved neural network

Nowadays, Tibetan speech synthesis based on neural network has become the mainstream synthesis method. Among them, the griffin-lim vocoder is widely used in Tibetan speech synthesis because of its relatively simple synthesis.Aiming at the problem of low fidelity of griffin-lim vocoder, this paper us...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	MATEC web of conferences 2021, Vol.336, p.6012
Hauptverfasser:	Ding, Yuntao, Cai, Rangzhuoma, Gong, Baojia
Format:	Artikel
Sprache:	eng
Schlagworte:	Feature extraction Forecasting Neural networks Speech Speech recognition Vocoders Waveforms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Nowadays, Tibetan speech synthesis based on neural network has become the mainstream synthesis method. Among them, the griffin-lim vocoder is widely used in Tibetan speech synthesis because of its relatively simple synthesis.Aiming at the problem of low fidelity of griffin-lim vocoder, this paper uses WaveNet vocoder instead of griffin-lim for Tibetan speech synthesis.This paper first uses convolution operation and attention mechanism to extract sequence features.And then uses linear projection and feature amplification module to predict mel spectrogram.Finally, use WaveNet vocoder to synthesize speech waveform. Experimental data shows that our model has a better performance in Tibetan speech synthesis.
ISSN:	2261-236X 2274-7214 2261-236X
DOI:	10.1051/matecconf/202133606012