Face Mask Extraction in Video Sequence

Inspired by the recent development of deep network-based methods in semantic image segmentation, we introduce an end-to-end trainable model for face mask extraction in video sequence. Comparing to landmark-based sparse face shape representation, our method can produce the segmentation masks of indiv...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of computer vision 2019-06, Vol.127 (6-7), p.625-641
Hauptverfasser:	Wang, Yujiang, Luo, Bingnan, Shen, Jie, Pantic, Maja
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis Artificial Intelligence Artificial neural networks Computer Imaging Computer Science Image processing Image Processing and Computer Vision Image segmentation Internet videos Masks Model accuracy Pattern Recognition Pattern Recognition and Graphics Skin care products Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Inspired by the recent development of deep network-based methods in semantic image segmentation, we introduce an end-to-end trainable model for face mask extraction in video sequence. Comparing to landmark-based sparse face shape representation, our method can produce the segmentation masks of individual facial components, which can better reflect their detailed shape variations. By integrating convolutional LSTM (ConvLSTM) algorithm with fully convolutional networks (FCN), our new ConvLSTM-FCN model works on a per-sequence basis and takes advantage of the temporal correlation in video clips. In addition, we also propose a novel loss function, called segmentation loss, to directly optimise the intersection over union (IoU) performances. In practice, to further increase segmentation accuracy, one primary model and two additional models were trained to focus on the face, eyes, and mouth regions, respectively. Our experiment shows the proposed method has achieved a 16.99% relative improvement (from 54.50 to 63.76% mean IoU) over the baseline FCN model on the 300 Videos in the Wild (300VW) dataset.
ISSN:	0920-5691 1573-1405
DOI:	10.1007/s11263-018-1130-2