Subband architecture for automatic speaker recognition
We present an original approach for automatic speaker identification especially applicable to environments which cause partial corruption of the frequency spectrum of the signal. The general principle is to split the whole frequency domain into several subbands on which statistical recognizers are i...
Gespeichert in:
Veröffentlicht in: | Signal processing 2000-07, Vol.80 (7), p.1245-1259 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present an original approach for automatic speaker identification especially applicable to environments which cause partial corruption of the frequency spectrum of the signal. The general principle is to split the whole frequency domain into several subbands on which statistical recognizers are independently applied and then recombined to yield a global score and a global recognition decision. The choice of the subband architecture and the recombination strategies are particularly discussed. This techniques had been shown to be robust for speech recognition when a narrow band noise degradation occur. We first objectively verify this robustness for the speaker identification task. We also study which information is really used to recognize speakers. For this, speaker identification experiments on independent subbands are conducted for 630 speakers of TIMIT and NTIMIT databases. The results show that the speaker specific information is not equally distributed among subbands. In particular, the low-frequency subbands (under 600
Hz) and the high-frequency subbands (over 3000
Hz) are more speaker-specific than middle-frequency ones. In addition, experiments on different subband system arechitectures show that the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into subbands. Consequently, we propose a particularly redundant parallel architecture for which most of the correlations are kept. The performances obtained with this new system, using linear recombination strategies, are equivalent to those of a conventional fullband recognizer on clean and telephone speech. Experiments on speech corrupted by unpredictable noise show a better adaptability of this approach in noisy environments, compared to a conventional device, especially when pruning of some recognizers is performed.
Wir stellen eine neuartige Methode zur automatischen Sprechererkennung vor, die speziell in Umgebungen, die eine teilweise Störung des Sprachfrequenzspektrums verursacht, anwendbar ist. Das grundlegende Prinzip besteht darin, den gesamten Frequenzbereich in verschiedene Teilbänder zu zerlegen, in denen statistische Erkenner unabhängig voneinander arbeiten und danach miteinander kombiniert werden, um zu einem globalen Maß und einer globalen Erkennungsentscheidung zu gelangen. Die Wahl der Teilbandarchitektur und die Kombinationsstrategien werden in Teilen diskutiert. Diese T |
---|---|
ISSN: | 0165-1684 1872-7557 |
DOI: | 10.1016/S0165-1684(00)00033-5 |