Active speaker detection with audio-visual co-training

© 2016 ACM. In this work, we show how to co-Train a classifier for active speaker detection using audio-visual data. First, audio Voice Activity Detection (VAD) is used to train a personalized video-based active speaker classifier in a weakly supervised fashion. The video classifier is in turn used...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Chakravarty, Jay, Zegers, Jeroen, Tuytelaars, Tinne, Van hamme, Hugo
Format: Tagungsbericht
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:© 2016 ACM. In this work, we show how to co-Train a classifier for active speaker detection using audio-visual data. First, audio Voice Activity Detection (VAD) is used to train a personalized video-based active speaker classifier in a weakly supervised fashion. The video classifier is in turn used to train a voice model for each person. The individual voice models are then used to detect active speakers. There is no manual supervision -Audio weakly supervises video classification, and the co-Training loop is completed by using the trained video classifier to supervise the training of a personalized audio voice classifier.