Automatic intelligibility classification of sentence-level pathological speech

•We propose novel sentence-level features to capture atypical variation.•Our sentence-level features are effective for intelligibility classification.•We propose a post-classification posterior smoothing scheme.•Our smoothing scheme improves classification accuracy of our systems.•We test feature-le...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer speech & language 2015-01, Vol.29 (1), p.132-144
Hauptverfasser: Kim, Jangwon, Kumar, Naveen, Tsiartas, Andreas, Li, Ming, Narayanan, Shrikanth S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We propose novel sentence-level features to capture atypical variation.•Our sentence-level features are effective for intelligibility classification.•We propose a post-classification posterior smoothing scheme.•Our smoothing scheme improves classification accuracy of our systems.•We test feature-level and subsystem fusions for the final intelligibility decision. Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. Although automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist experts in diagnosis and treatment design, the many sources and types of variability often make it a very challenging computational processing problem. In this work we propose novel sentence-level features to capture abnormal variation in the prosodic, voice quality and pronunciation aspects in pathological speech. In addition, we propose a post-classification posterior smoothing scheme which refines the posterior of a test sample based on the posteriors of other test samples. Finally, we perform feature-level fusions and subsystem decision fusion for arriving at a final intelligibility decision. The performances are tested on two pathological speech datasets, the NKI CCRT Speech Corpus (advanced head and neck cancer) and the TORGO database (cerebral palsy or amyotrophic lateral sclerosis), by evaluating classification accuracy without overlapping subjects’ data among training and test partitions. Results show that the feature sets of each of the voice quality subsystem, prosodic subsystem, and pronunciation subsystem, offer significant discriminating power for binary intelligibility classification. We observe that the proposed posterior smoothing in the acoustic space can further reduce classification errors. The smoothed posterior score fusion of subsystems shows the best classification performance (73.5% for unweighted, and 72.8% for weighted, average recalls of the binary classes).
ISSN:0885-2308
1095-8363
DOI:10.1016/j.csl.2014.02.001