Enabling improved speaker recognition by voice quality estimation

Presented is a method to mitigate noise and interference in automated speaker identification (SID). This process uses the MIT/LL SID module without modifications. In this process, speaker models are built for a lattice of signal to noise ratio (SNR) levels. The SNR of the received signal is estimate...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Bartos, A. L., Nelson, D. J.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	EER Equal Error Rate Language ID LID Load modeling SAD SID Signal to noise ratio SNR Speaker ID Speech Speech Activity Detection Speech processing Training Training data VAD voice Activity Detection Voice Quality Estimate VQE
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Presented is a method to mitigate noise and interference in automated speaker identification (SID). This process uses the MIT/LL SID module without modifications. In this process, speaker models are built for a lattice of signal to noise ratio (SNR) levels. The SNR of the received signal is estimated by first applying speech activity detection to identify portions of the signal that actually contain speech. A voice quality estimation process is then applied to estimate the SNR of the received signal. The speaker models representing the SNR of the received signal are dynamically loaded, and conventional SID is applied. In training, the SNR of each training signal is estimated, and the signal is modified by adding noise to create a signal at the desired SNR. Using this process, each signal may be used to train models at any SNR level less than or equal to the SNR of the original signal. The process has been fully implemented and is completely automated.
ISSN:	1058-6393 2576-2303
DOI:	10.1109/ACSSC.2011.6190071