Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks

Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep learning models,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Acta acustica united with Acustica 2019-07, Vol.105 (4), p.587-590
Hauptverfasser:	Juanpere, Eloi Moliner, Csapó, Tamás Gábor
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep learning models, such as a baseline Feed-forward, and a combination of Convolutional and Recurrent Neural Networks are presented and discussed. A pre-processing step using a Deep Convolutional AutoEncoder was also studied. According to the experimental results, an architecture based on a CNN and bidirectional LSTM layers has shown the best objective and subjective results.
ISSN:	1610-1928
DOI:	10.3813/AAA.919339