A divide-and-conquer strategy for facial landmark detection using dual-task CNN architecture

•We propose a novel deep learning-based framework for facial landmark detection (FLD).•The proposed framework formulates the problem of FLD as a divide-conquer search for facial patches using CNN architecture in a hierarchy.•A better division face topology is obtained by searching in a structured co...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2020-11, Vol.107, p.107504, Article 107504
Hauptverfasser: Hannane, Rachida, Elboushaki, Abdessamad, Afdel, Karim
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We propose a novel deep learning-based framework for facial landmark detection (FLD).•The proposed framework formulates the problem of FLD as a divide-conquer search for facial patches using CNN architecture in a hierarchy.•A better division face topology is obtained by searching in a structured coarse-to-fine manner.•A cascaded regressor is proposed to detect and refine the position of the individual landmark in each predicted non-overlapped patch.•We compare our approach with many existing methods and we achieve the state-of-the-art performances on several challenging datasets. In this paper, we propose a novel deep learning-based framework for facial landmark detection. This framework takes as input face image returned by a face detector (Faster R-CNN) and generates as output a set of landmarks positions. Prior CNN-based methods often select randomly small local patches to predict an initial guess of landmarks locations. One issue with these local patches is that the adjacent landmarks might share the same regions due to the overlapping, thus, they might not convey precise information of each individual landmark. By contrast, our approach formulates this problem as a divide-conquer search for facial patches using CNN architecture in a hierarchy, where the input face image is recursively split into two cohesive non-overlapped subparts until each one contains only the region around the expected landmark. To attain better division of face topology, the search is carried out in a structured coarse-to-fine manner, where a learned hierarchical model of the face defining the granularity of each division level is introduced. We also propose a cascaded regressor to detect and refine the position of the individual landmark in each predicted non-overlapped patch. We adopt a carefully designed shallow CNN architecture so that to improve real-time performance. In addition, unlike previous cascaded methods, our regressor does not require auxiliary input such as initial landmarks locations. Extensive experiments on several challenging datasets (including MTFL, AFW, AFLW, COFW, 300W, and 300VW) show that our approach is particularly impressive in the unconstrained scenarios where it outperforms prior arts in both accuracy and efficiency.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2020.107504