Speaker recognition with region-constrained MLLR transforms

It has been shown that standard cepstral speaker recognition models can be enhanced by region-constrained models, where features are extracted only from certain speech regions defined by linguistic or prosodic criteria. Such region-constrained models can capture features that are more stable, highly...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Stolcke, A., Mandal, A., Shriberg, E.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:It has been shown that standard cepstral speaker recognition models can be enhanced by region-constrained models, where features are extracted only from certain speech regions defined by linguistic or prosodic criteria. Such region-constrained models can capture features that are more stable, highly idiosyncratic, or simply complementary to the baseline system. In this paper we ask if another major class of speaker recognition models, those based on MLLR speaker adaptation transforms, can also benefit from region-constrained feature extraction. In our approach, we define regions based on phonetic and prosodic criteria, based on automatic speech recognition output, and perform MLLR estimation using only frames selected by these criteria. The resulting transform features are appended to those of a state-of-the-art MLLR speaker recognition system and jointly modeled by SVMs. Multiple regions can be added in this fashion. We find consistent gains over the baseline system in the SRE2010 speaker verification task.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2012.6288894