Speaker Clustering and Cluster Purification Methods for RT07 and RT09 Evaluation Meeting Data
This paper presents a design strategy for the speaker diarization system in the IIR submissions to the 2007 and 2009 NIST Rich Transcription Meeting Recognition Evaluations (RT07 and RT09) for the multiple distant microphone (MDM) condition. The system features two algorithms supporting two importan...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-02, Vol.20 (2), p.461-473 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper presents a design strategy for the speaker diarization system in the IIR submissions to the 2007 and 2009 NIST Rich Transcription Meeting Recognition Evaluations (RT07 and RT09) for the multiple distant microphone (MDM) condition. The system features two algorithms supporting two important steps in a diarization process. The first step is Initial Segmentation and Clustering (ISC), and the second one is cluster merging and purification. In the ISC step, we propose a histogram quantization and clustering technique based on time delay of arrival (TDOA) features by analyzing the correlation among the signals across multiple distant microphones. In the cluster merging and purification step, we further merge the speaker clusters using a Bayesian information criterion (BIC) to consolidate the clusters to arrive at one-cluster-per-speaker. The two steps work in tandem to form an integral process. We propose a novel Consensus Based Cluster Purification (CBCP) method that involves a technique to remove impure speaker segments in the speaker clusters before speaker modeling in the cluster purification process. The system reports a state-of-the-art performance of speaker diarization for RT07 and RT09 MDM condition with 7.47% and 8.77% Diarization error rates (DERs), respectively, for both overlapping and non-overlapping speech. |
---|---|
ISSN: | 1558-7916 2329-9290 1558-7924 2329-9304 |
DOI: | 10.1109/TASL.2011.2159203 |