Lip Reading Using Committee Networks With Two Different Types of Concatenated Frame Images

This paper proposes a lip-reading method based on convolutional neural networks (CNNs) applied to two different types of concatenated frame images (CFIs), consisting of (a) full-lip images and (b) patches around lip landmarks. In addition, we introduce committee networks with the predictions obtaine...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2019, Vol.7, p.90125-90131
Hauptverfasser:	Jang, Dong-Won, Kim, Hong-In, Je, Changsoo, Park, Rae-Hong, Park, Hyung-Min
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Committee networks concatenated frame images convolutional neural networks Datasets Kernel Lip reading Lips Neurons Speech recognition time-based label-preserving transform Training Transforms visual speech recognition Visualization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper proposes a lip-reading method based on convolutional neural networks (CNNs) applied to two different types of concatenated frame images (CFIs), consisting of (a) full-lip images and (b) patches around lip landmarks. In addition, we introduce committee networks with the predictions obtained from the two different types of the CFIs, which provide better performance than single or committee networks using either type of the CFIs. For efficient training using a limited dataset, such as OuluVS2, we propose time-based label-preserving transform and use a quarter VGG-m in which the number of parameters is reduced compared to the VGG-m. The experimental results with the OuluVS2 dataset show that the proposed method using different types of the CFIs in committee networks outperformed the state-of-the-art methods without pre-training using a large-scale dataset.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2019.2927166