CASA-Based Robust Speaker Identification

Conventional speaker recognition systems perform poorly under noisy conditions. Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech by producing a binary time-frequency mask. We investigate CASA for robust speaker identification. We first introdu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-07, Vol.20 (5), p.1608-1616
Hauptverfasser:	Zhao, Xiaojia, Shao, Yang, Wang, DeLiang
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Cepstral analysis Computational auditory scene analysis (CASA) Exact sciences and technology Feature extraction Filter banks gammatone frequency cepstral coefficient (GFCC) ideal binary mask Information, signal and communications theory Miscellaneous Noise measurement robust speaker identification Robustness Signal and communications theory Signal processing Signal representation. Spectral analysis Signal, noise Speaker recognition Speech Speech processing Telecommunications and information theory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Conventional speaker recognition systems perform poorly under noisy conditions. Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech by producing a binary time-frequency mask. We investigate CASA for robust speaker identification. We first introduce a novel speaker feature, gammatone frequency cepstral coefficient (GFCC), based on an auditory periphery model, and show that this feature captures speaker characteristics and performs substantially better than conventional speaker features under noisy conditions. To deal with noisy speech, we apply CASA separation and then either reconstruct or marginalize corrupted components indicated by a CASA mask. We find that both reconstruction and marginalization are effective. We further combine the two methods into a single system based on their complementary advantages, and this system achieves significant performance improvements over related systems under a wide range of signal-to-noise ratios.
ISSN:	1558-7916 2329-9290 1558-7924 2329-9304
DOI:	10.1109/TASL.2012.2186803