Comparison of speaker normalization techniques for classification of emotionally disturbed subjects based on voice

When reviewing his clinical experience in treating suicidal patients, one of the authors observed that successful predictions of suicidality were often based on the patients voice independent of content. Research has shown that the Gaussian mixture model of the mel-cepstral features of speech can be...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Subari, K S, Wilkes, D M, Shiavi, R G, Silverman, S E, Silverman, M K
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:When reviewing his clinical experience in treating suicidal patients, one of the authors observed that successful predictions of suicidality were often based on the patients voice independent of content. Research has shown that the Gaussian mixture model of the mel-cepstral features of speech can be used to distinguish the speech of suicidal persons from that of depressed and control persons with high classification rates. Since the vocal tract length vary from person to person, can the classification rates of suicidal persons be improved through speaker normalization? We approach this problem by warping the frequency axis of the mel-cepstral features. The results show that two different approaches yielded the best results: i) by using the maximum-likelihood approach in a gender-independent database to compute the warping factor for a nonlinear warp and ii) by a transformation of the first three formants in a gender-dependent database to compute the warping factor for a linear warp.
DOI:10.1109/IECBES.2010.5742248