Stockwell-Transform based feature representation for detection and assessment of voice disorders
In literature, various time-frequency representation methods were investigated for automatic detection of voice disorders. Stockwell-Transform (S-Transform) provides good time-frequency localization; hence, it may efficiently capture the voice disorder related information from speech signal. With th...
Gespeichert in:
Veröffentlicht in: | International journal of speech technology 2024-03, Vol.27 (1), p.101-119 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In literature, various time-frequency representation methods were investigated for automatic detection of voice disorders. Stockwell-Transform (S-Transform) provides good time-frequency localization; hence, it may efficiently capture the voice disorder related information from speech signal. With this motivation, we investigated different variants of S-Transform for the classification of voice disorders. This study proposed the S-Transform based cepstral coefficients for voice disorder detection and assessment. The performance of the proposed feature was compared with baseline features on SVD and HUPA databases. Compared to baseline features, proposed features performed best in terms of classification accuracy of 80.2% and 79.8% on HUPA and SVD databases, respectively for voice disorder detection task. Also, the proposed features performed better in case of assessment task. Further, the experimental results reveal that combining cepstral coefficients derived from S-Transform with baseline features improved the performance of proposed systems by 8% and 4% for detection and assessment tasks, respectively which highlights complementary nature of the explored features. We also analysed the effectiveness of S-Transform based spectral representation in capturing the acoustic characteristics for various voice qualities like breathiness, harshness, creakiness, and falsetto phonations. This representation was also compared with other time-frequency based methods such as STFT, ZTW and SFF. It was observed that S-Transform effectively captures the acoustic variations associated with different voice qualities compared to other baseline methods, which may be due to better spectro-temporal resolution offered by the S-Transform. |
---|---|
ISSN: | 1381-2416 1572-8110 |
DOI: | 10.1007/s10772-024-10085-w |