Learning Expressionlets via Universal Manifold Model for Dynamic Facial Expression Recognition

Facial expression is a temporally dynamic event which can be decomposed into a set of muscle motions occurring in different facial regions over various time intervals. For dynamic expression recognition, two key issues, temporal alignment and semantics-aware dynamic representation, must be taken int...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing 2016-12, Vol.25 (12), p.5920-5932
Hauptverfasser:	Liu, Mengyi, Shan, Shiguang, Wang, Ruiping, Chen, Xilin
Format:	Artikel
Sprache:	eng
Schlagworte:	Computational modeling discriminant Learning Dynamics expressionlets Face recognition Facial expression recognition Feature extraction Hidden Markov models Manifolds Riemannian manifold universal manifold model Videos
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Facial expression is a temporally dynamic event which can be decomposed into a set of muscle motions occurring in different facial regions over various time intervals. For dynamic expression recognition, two key issues, temporal alignment and semantics-aware dynamic representation, must be taken into account. In this paper, we attempt to solve both problems via manifold modeling of videos based on a novel mid-level representation, i.e., expressionlet. Specifically, our method contains three key stages: 1) each expression video clip is characterized as a spatial-temporal manifold (STM) formed by dense low-level features; 2) a universal manifold model (UMM) is learned over all low-level features and represented as a set of local modes to statistically unify all the STMs; and 3) the local modes on each STM can be instantiated by fitting to the UMM, and the corresponding expressionlet is constructed by modeling the variations in each local mode. With the above strategy, expression videos are naturally aligned both spatially and temporally. To enhance the discriminative power, the expressionlet-based STM representation is further processed with discriminant embedding. Our method is evaluated on four public expression databases, CK+, MMI, Oulu-CASIA, and FERA. In all cases, our method outperforms the known state of the art by a large margin.
ISSN:	1057-7149 1941-0042
DOI:	10.1109/TIP.2016.2615424