CNN (Convolutional Neural Network) and Transformer mixed domain generalization gaze estimation algorithm

The invention belongs to the field of computer vision, and provides a CNN (Convolutional Neural Network) and Transformer mixed domain generalization gaze estimation algorithm, which comprises the following steps of: firstly, performing feature extraction by adopting a ResNest-50 and ViT double-flow...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ZHAO WENJUN, GE BIN, ZHOU GUANG'AO, XIA CHENXING, TAO ZHANPENG, GAO XIUJU
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention belongs to the field of computer vision, and provides a CNN (Convolutional Neural Network) and Transformer mixed domain generalization gaze estimation algorithm, which comprises the following steps of: firstly, performing feature extraction by adopting a ResNest-50 and ViT double-flow feature extraction network, and using multilayer fusion features of ResNest-50 branches as the input of a ViT network to replace the use of a full face image as the input; then, in order to relieve dimension and semantic differences of output features of the ResNest branch and the ViT branch, a feature fusion enhancement module (FFEM) is designed to be used for fusing the output features of the double branches; afterwards, a domain generalization method based on an adversarial strategy is proposed to improve the cross-domain performance of the model. An additional image reconstruction task is designed for adversarial learning with the gaze estimation task, and a mutual information neural estimator (MINE) is used to