An Active-gaze Morphable Model for 3D Gaze Estimation

Gaze estimation methods typically regress gaze directions directly from images using a deep network. We show that equipping a deep network with an explicit 3D shape model can: i) improve gaze estimation accuracy, ii) perform well with lower resolution inputs at high frame rates and, importantly, iii...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Sun, Hao, Pears, N. E, Smith, William Alfred Peter
Format: Tagungsbericht
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Gaze estimation methods typically regress gaze directions directly from images using a deep network. We show that equipping a deep network with an explicit 3D shape model can: i) improve gaze estimation accuracy, ii) perform well with lower resolution inputs at high frame rates and, importantly, iii) provide a much richer understanding of the eye-region and its constituent gaze system, thus lending itself to a wider range of applications. We use an `eyes and nose' 3D Morphable Model (3DMM) to capture relevant local 3D facial geometry and appearance, and we equip this with a geometric vergence model of gaze to give an `active-gaze 3DMM'. Latent codes are used to express eye-region shape, appearance, pose, scale and gaze directions, with these being regressed using a tiny Swin transformer. We achieve fast real time at 89 fps without fitted model rendering and 34 fps with rendering. Our system shows state-of-the-art results on the Eyediap dataset, which provides 3D training supervision and highly competitive results on ETH-XGaze, despite a lack of 3D supervision and without modelling the kappa angle. Indeed, our method can learn with only the ground truth gaze target point and the camera parameters, without access to the ground truth gaze origin points, thus significantly widening applicability.