Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks
Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep learning models,...
Gespeichert in:
Veröffentlicht in: | Acta acustica united with Acustica 2019-07, Vol.105 (4), p.587-590 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Silent Speech Interface (SSI) is a technology with the goal of synthesizing speech from articulatory motion. A Deep Neural Network based SSI using ultrasound images of the tongue as input signals and spectral coefficients of a vocoder as target parameters are proposed. Several deep
learning models, such as a baseline Feed-forward, and a combination of Convolutional and Recurrent Neural Networks are presented and discussed. A pre-processing step using a Deep Convolutional AutoEncoder was also studied. According to the experimental results, an architecture based on a CNN
and bidirectional LSTM layers has shown the best objective and subjective results. |
---|---|
ISSN: | 1610-1928 |
DOI: | 10.3813/AAA.919339 |