Unsupervised Facial Expression Representation Learning with Contrastive Local Warping
This paper investigates unsupervised representation learning for facial expression analysis. We think Unsupervised Facial Expression Representation (UFER) deserves exploration and has the potential to address some key challenges in facial expression analysis, such as scaling, annotation bias, the di...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper investigates unsupervised representation learning for facial
expression analysis. We think Unsupervised Facial Expression Representation
(UFER) deserves exploration and has the potential to address some key
challenges in facial expression analysis, such as scaling, annotation bias, the
discrepancy between discrete labels and continuous emotions, and model
pre-training. Such motivated, we propose a UFER method with contrastive local
warping (ContraWarping), which leverages the insight that the emotional
expression is robust to current global transformation (affine transformation,
color jitter, etc.) but can be easily changed by random local warping.
Therefore, given a facial image, ContraWarping employs some global
transformations and local warping to generate its positive and negative samples
and sets up a novel contrastive learning framework. Our in-depth investigation
shows that: 1) the positive pairs from global transformations may be exploited
with general self-supervised learning (e.g., BYOL) and already bring some
informative features, and 2) the negative pairs from local warping explicitly
introduce expression-related variation and further bring substantial
improvement. Based on ContraWarping, we demonstrate the benefit of UFER under
two facial expression analysis scenarios: facial expression recognition and
image retrieval. For example, directly using ContraWarping features for linear
probing achieves 79.14% accuracy on RAF-DB, significantly reducing the gap
towards the full-supervised counterpart (88.92% / 84.81% with/without
pre-training). |
---|---|
DOI: | 10.48550/arxiv.2303.09034 |