Variational conditional random fields for online speaker detection and tracking
► Speaker tracking using conditional random fields is discussed. ► The main contribution is to adapt variational framework for CRF training. ► The results show good performance of CRF model for speaker detection and tracking. There are many references that concern a specific aspect of speaker tracki...
Gespeichert in:
Veröffentlicht in: | Speech communication 2012-07, Vol.54 (6), p.763-780 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | ► Speaker tracking using conditional random fields is discussed. ► The main contribution is to adapt variational framework for CRF training. ► The results show good performance of CRF model for speaker detection and tracking.
There are many references that concern a specific aspect of speaker tracking. This paper focuses on the speaker modeling issue and proposes conditional random fields (CRF) for this purpose. CRF is a class of undirected graphical models for classifying sequential data. CRF has some interesting characteristics which have encouraged us to use this model in a speaker modeling and tracking task. The main concern of CRF model is its training. Known approaches for CRF training are prone to overfitting and unreliable convergence. To solve this problem, variational approaches are proposed in this paper. The main novelty of this paper is to adapt variational framework for CRF training. The resulted approach is evaluated on three different areas. First, the best CRF model configuration for speaker modeling is evaluated on text independent speaker verification. Next, the selected model is used in a speaker detection task, in which the models of the existing speakers in the conversation are known a priori. Then, the proposed CRF approach is compared with GMM in an online speaker tracking framework. The results show that the proposed CRF model is superior to GMM in speaker detection and tracking, due to its capability for sequence modeling and segmentation. |
---|---|
ISSN: | 0167-6393 1872-7182 |
DOI: | 10.1016/j.specom.2012.01.005 |