Unified wavelet-based framework for evaluation of voice impairment

Laryngeal pathologies have a significant influence on the quality of life, verbal communication, and the human profession. Most organic vocal pathologies affect the shape and vibration pattern of the vocal fold(s). Many automatic computer-based, non-intrusive systems for rapid detection and progress...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of speech technology 2022, Vol.25 (2), p.527-548
Hauptverfasser: Gidaye, Girish, Nirmal, Jagannath, Ezzine, Kadria, Frikha, Mondher
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Laryngeal pathologies have a significant influence on the quality of life, verbal communication, and the human profession. Most organic vocal pathologies affect the shape and vibration pattern of the vocal fold(s). Many automatic computer-based, non-intrusive systems for rapid detection and progression tracking have been introduced in recent years. This paper proposes an integrated wavelet-based voice condition evaluation framework, which is independent of human bias and language. The true voice source is extracted using quasi-closed phase (QCP) glottal inverse filtering to capture the altered vocal fold(s) dynamics. The voice source is decomposed using stationary wavelet transform (SWT) and the fundamental frequency independent statistical and energy measures are extracted from each spectral sub-band to quantify the voice source. As the multilevel stationary wavelet decomposition leads to high-dimensional feature vector, information gain-based feature ranking process is harnessed to pick up the most discerning features. Speech samples of sustained vowel / a / mined from four distinct databases in German, Spanish, English and Arabic are used to perform different intra-and cross-database experiments. The effect of the decomposition level on detection and classification accuracy is observed and the fifth level of decomposition is found to result in the highest recognition rate. Achieved performance metrics of classifiers suggest that SWT based energy and statistical features reveal more resourceful information on pathological voices and thus the proposed system can be used as a complimentary tool for clinical diagnosis of laryngeal pathologies.
ISSN:1381-2416
1572-8110
DOI:10.1007/s10772-022-09969-6