Towards a system for the semi-automatic annotation of eye gaze data in face-to-face interactions

Studies on language and cognition in interaction increasingly focus on the role of eye gaze as an important signal in interaction management, reference and grounding (Rossano 2012, Jokinen 2010, Bailly et al. 2010, Richardson et al. 2009). Interlocutors may use eye gaze as a means to take, hold or g...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: De Beugher, Stijn, Brône, Geert, Goedemé, Toon
Format: Other
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Studies on language and cognition in interaction increasingly focus on the role of eye gaze as an important signal in interaction management, reference and grounding (Rossano 2012, Jokinen 2010, Bailly et al. 2010, Richardson et al. 2009). Interlocutors may use eye gaze as a means to take, hold or give the floor in conversation (turn management), to refer to objects or persons in the conversational space (gaze cueing) or to give and elicit feedback (grounding). The use of non-intrusive eye-tracking technology (like eye-tracking glasses or table-top systems) has proven to be an invaluable resource for obtaining detailed information on the distribution of visual attention of multiple participants simultaneously (Jokinen 2010, Oertel & Salvi 2013, Brône & Oben 2015, Oben 2015, Holler & Kendrick 2015). This generates a picture of the role of eye gaze in interactional dynamics, with speakers and addressees displaying different gaze patterns. One of the key challenges in the use of mobile eye-tracking technology, however, resides in the processing and annotation of the obtained data stream. To date, the annotation process is largely manual, which is time-consuming and labour-intensive. Part of this work, however, can be automatized using recognition algorithms from vision technology (De Beugher, Brône & Goedemé 2013). In this paper, we present one such system for the semi-automatic recognition of human faces, torso and hands, thus providing a first categorization of targets on which the gaze data of the eye-tracking systems can be mapped. In other words, the system analyses data captured by the scene camera of the eye-tracker, calculates scores for the annotation classes (face, torso, hands), and then searches for matches between gaze coordinates and annotations (i.e. are there gaze fixations on faces, torsos and hands?). The approach we present in this talk partly builds on previous work on the detection of human torso and faces (De Beugher, Brône & Goedemé 2013) but has been improved for the computational cost of the algorithms. The detection of hands, on the other hand, is based on an accurate segmentation in combination with advanced tracking mechanisms and a validation of human poses. The algorithms we used are embedded in a semi-automatic tool, which calculates the confidence of the hand detections. If the confidence drops below a certain threshold, the automatic analysis is halted and the user is asked for manual annotation. After this intervention, the s