Depth-aware gaze-following via auxiliary networks for robotics

Gaze-Following aims to predict the gaze target of a subject within an image, and information on orientation and depth greatly improves this task. However, previous methods require additional datasets to obtain depth or orientation information, leading to cumbersome training or inference processes. T...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Engineering applications of artificial intelligence 2022-08, Vol.113, p.104924, Article 104924
Hauptverfasser:	Jin, Tianlei, Yu, Qizhi, Zhu, Shiqiang, Lin, Zheyuan, Ren, Jie, Zhou, Yuanhai, Song, Wei
Format:	Artikel
Sprache:	eng
Schlagworte:	Depth-aware Gaze-following Human–robot interaction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Gaze-Following aims to predict the gaze target of a subject within an image, and information on orientation and depth greatly improves this task. However, previous methods require additional datasets to obtain depth or orientation information, leading to cumbersome training or inference processes. To this end, we propose an end-to-end depth-aware gaze-following approach that incorporates depth and orientation information without additional datasets. Our approach identifies a primary task, gaze-following, supervised by true labels from the gaze-following dataset and two auxiliary tasks, scene depth estimation and 3D orientation estimation, supervised by generated pseudo labels. Intermediate auxiliary features are integrated into the primary task network as implicit information. We propose a residual filter module for screening useful information that can enhance gaze-following prediction performance. Extensive experiments on GazeFollow and VideoAttentionTarget show that our approach achieves state-of-the-art results (0.120 Ave. Dist. achieved on GazeFollow and 0.104 L2 Dist. achieved on VideoAttentionTarget). Finally, we apply our approach to a real robot for understanding human attention and intention. Compared to the previous depth considered gaze-following method, our method saves half of the computation time.
ISSN:	0952-1976 1873-6769
DOI:	10.1016/j.engappai.2022.104924