Talking faces indexing in TV-content

Our objective is to index talking faces in a TV-Context: build a description of TV-content, in terms of talking people, without any pre-defined dictionary of identities. In TV-content, because of multi-face shots and non-speaking face shots, it is difficult to determine which face is speaking. In th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Bendris, Meriem, Charlet, Delphine, Chollet, Gérard
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Our objective is to index talking faces in a TV-Context: build a description of TV-content, in terms of talking people, without any pre-defined dictionary of identities. In TV-content, because of multi-face shots and non-speaking face shots, it is difficult to determine which face is speaking. In this work, a method is proposed which clusters people independently by the audio and by the visual information and combines these clusterings of people (audio and visual) in order to detect sequences of talking faces. The audio indexing system is based on agglomerative clustering with the Bayesian Information Criterion. The visual indexing system is based on costume detection and clustering of color histograms. The combination of both indexes is based on searching for the best match between both clusterings, to obtain a correspondence between the automatic audio labels and the automatic video labels. The talking faces are then determined by the intersection of the segments of the associated audio and video labels. Results of experiments on a TV-Show database show that a high correct detection rate can be achieved by the proposed method.
ISSN:1949-3983
1949-3991
DOI:10.1109/CBMI.2010.5529907