Comparison of Cepstral Analysis Based on Voiced-Segment Extraction and Voice Tasks for Discriminating Dysphonic and Normophonic Korean Speakers

This study investigated whether there are differences in the discriminatory power of cepstral analysis according to the voiced-segment extraction method and voice tasks used for identifying dysphonic and normophonic Korean individuals. A total of 2,863 subjects (2,595 subjects with and 268 subjects...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of voice 2021-03, Vol.35 (2), p.328.e11-328.e22
Hauptverfasser: Kim, Geun-Hyo, Bae, In-Ho, Park, Hee-June, Lee, Yeon-Woo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This study investigated whether there are differences in the discriminatory power of cepstral analysis according to the voiced-segment extraction method and voice tasks used for identifying dysphonic and normophonic Korean individuals. A total of 2,863 subjects (2,595 subjects with and 268 subjects without dysphonia) were included in this study. The 3-second sustained vowel (SV) /a/ and one sentence of “Sanchaek” were edited and analyzed using Praat scripts. Cepstral analyses (cepstral peak prominence [CPP], smoothed cepstral peak prominence [CPPS], and low/high spectral ratio [LHRatio]) were performed using three voice tasks, namely, SV, continuous speech (CS), and extracted continuous speech (EXT) samples. Additionally, auditory-perceptual (A-P) assessments were performed by three speech language pathologists. Significant differences were found between dysphonic and normophonic voice groups for all cepstral parameters, except the LHRatio_EXT. Cepstral measurements of both SV and CS were highly correlated with A-P ratings. Furthermore, the diagnostic predictive power of CPP and CPPS for CS using the area under the receiver operating characteristic curve (AUC) was >0.919, the positive likelihood ratio (LR+) was ≥6.85, and the negative likelihood ratio (LR−) was ≤0.23. Additionally, for EXT, the AUC was >0.816, LR+ was 3.10, and LR− was ≤0.33. Both CS and EXT can predict dysphonia relatively well (r > 0.816). EXT showed lower predictability than the original sample (CS) analysis. Subsequent studies should implement voiced-segment extraction methods using various algorithms.
ISSN:0892-1997
1873-4588
DOI:10.1016/j.jvoice.2019.09.009