Effect of Clinical Depression on Automatic Speaker Identification

This study investigates effects of a clinical environment on speaker recognition rates. Two sets of speakers were used: a clinical set containing speech recordings of 70 clinically depressed speakers and a control set containing 68 non-depressed speakers. MFCC characteristic features were used to pr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Memon, S., Maddage, N., Lech, M., Allen, N.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Data mining Degradation Feature extraction Loudspeakers Mel frequency cepstral coefficient Performance analysis Psychology Speaker recognition Speech recognition Speech synthesis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This study investigates effects of a clinical environment on speaker recognition rates. Two sets of speakers were used: a clinical set containing speech recordings of 70 clinically depressed speakers and a control set containing 68 non-depressed speakers. MFCC characteristic features were used to produce statistical models of speakers using four modeling methods: GMM_EM, GMM_K-means, GMM_LBG, and LBG_ITVQ. In all cases the speaker recognition rates for the depressed speakers were lower (60%-71%) than for the non-depressed speakers (79%-89%). In this work we also analyze the performance of VQ based Gaussian modeling and suggest that GMM-EM has the higher recognition rates, however the performance of GMM-ITVQ is comparable to GMM-EM. We also perform the experiments using different number of Gaussian mixtures in between 1-1024 and obtain the results that adding more mixtures increases the complexity, makes the thinner distribution of data and thus degrades the recognition rate. Results in this work also suggest that the size of train and test speech could affect the recognition rates largely.
ISSN:	2151-7614 2151-7622
DOI:	10.1109/ICBBE.2009.5162690