Subband architecture for automatic speaker recognition

We present an original approach for automatic speaker identification especially applicable to environments which cause partial corruption of the frequency spectrum of the signal. The general principle is to split the whole frequency domain into several subbands on which statistical recognizers are i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal processing 2000-07, Vol.80 (7), p.1245-1259
Hauptverfasser: Besacier, Laurent, Bonastre, Jean-François
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We present an original approach for automatic speaker identification especially applicable to environments which cause partial corruption of the frequency spectrum of the signal. The general principle is to split the whole frequency domain into several subbands on which statistical recognizers are independently applied and then recombined to yield a global score and a global recognition decision. The choice of the subband architecture and the recombination strategies are particularly discussed. This techniques had been shown to be robust for speech recognition when a narrow band noise degradation occur. We first objectively verify this robustness for the speaker identification task. We also study which information is really used to recognize speakers. For this, speaker identification experiments on independent subbands are conducted for 630 speakers of TIMIT and NTIMIT databases. The results show that the speaker specific information is not equally distributed among subbands. In particular, the low-frequency subbands (under 600 Hz) and the high-frequency subbands (over 3000 Hz) are more speaker-specific than middle-frequency ones. In addition, experiments on different subband system arechitectures show that the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into subbands. Consequently, we propose a particularly redundant parallel architecture for which most of the correlations are kept. The performances obtained with this new system, using linear recombination strategies, are equivalent to those of a conventional fullband recognizer on clean and telephone speech. Experiments on speech corrupted by unpredictable noise show a better adaptability of this approach in noisy environments, compared to a conventional device, especially when pruning of some recognizers is performed. Wir stellen eine neuartige Methode zur automatischen Sprechererkennung vor, die speziell in Umgebungen, die eine teilweise Störung des Sprachfrequenzspektrums verursacht, anwendbar ist. Das grundlegende Prinzip besteht darin, den gesamten Frequenzbereich in verschiedene Teilbänder zu zerlegen, in denen statistische Erkenner unabhängig voneinander arbeiten und danach miteinander kombiniert werden, um zu einem globalen Maß und einer globalen Erkennungsentscheidung zu gelangen. Die Wahl der Teilbandarchitektur und die Kombinationsstrategien werden in Teilen diskutiert. Diese T
ISSN:0165-1684
1872-7557
DOI:10.1016/S0165-1684(00)00033-5