Multi-sourced Knowledge Integration for Robust Self-Supervised Facial Landmark Tracking

Expensive annotation costs significantly hinder the development of facial landmark tracking owing to the frame-by-frame labeling of dense landmarks. The most promising approach to address this problem is to develop a self-supervised tracker for large-scale unlabeled videos. However, existing self-su...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2023-01, Vol.25, p.1-13
Hauptverfasser: Zhu, Congong, Li, Xiaoqiang, Li, Jide, Dai, Songmin, Tong, Weiqin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Expensive annotation costs significantly hinder the development of facial landmark tracking owing to the frame-by-frame labeling of dense landmarks. The most promising approach to address this problem is to develop a self-supervised tracker for large-scale unlabeled videos. However, existing self-supervised trackers trained using single-sourced knowledge are unstable under unconstrained environments. Herein, we propose multi-sourced knowledge integration (MSKI), a robust self-supervised tracking method. It integrates knowledge from multiple sources to provide supervisory signals, thereby improving the stability of the self-supervised tracker. Specifically, the proposed MSKI comprises two complementary modules: a temporal knowledge reasoning (TempRes) module and an interactive knowledge distillation (KnowDist) module. The TempRes module enforces the tracker to achieve cycle-consistent tracking, allowing the tracker to learn temporal correspondence based on the cycle-consistency of time. To exploit facial geometry knowledge against various occlusions, our tracker imposes a multi-level shape constraint over the structure of facial landmarks by leveraging adversarial shape learning, thereby enabling the tracking of occluded faces. Moreover, the tracker interacts with an initialization detector to further develop complementary knowledge via KnowDist. The KnowDist module distills the spatial and temporal knowledge provided by the detector and tracker to generate plausible labels automatically. Finally, these generated labels are utilized to fine-tune the detector, such that it provides high-quality initial landmarks for the cycle-consistent tracking of the tracker on unlabeled videos. The experimental results show that the proposed MSKI can stabilize the tracking trajectory and improve the robustness against various occlusions.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2022.3212265