Voicing-, voiceless-, and non-glimpses in speech intelligibility prediction
The number of speech spectro-temporal (S-T) regions escaping from noise masking, known as “glimpses,” is proportional to speech intelligibility in noise. Previous studies have demonstrated that intelligibility can be estimated by calculating the glimpse proportion (GP). More recent evidence revealed...
Gespeichert in:
Veröffentlicht in: | The Journal of the Acoustical Society of America 2023-03, Vol.153 (3_supplement), p.A172-A172 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The number of speech spectro-temporal (S-T) regions escaping from noise masking, known as “glimpses,” is proportional to speech intelligibility in noise. Previous studies have demonstrated that intelligibility can be estimated by calculating the glimpse proportion (GP). More recent evidence revealed that the contribution of glimpses to intelligibility differs in the energy level of the glimpsed regions, and that even non-glimpsed regions play a non-negligible role in speech perception in noise. This study incorporated the voicing-viceless information in estimating intelligibility using glimpses. Before computing the GP, the counts of raw glimpsed regions or those with energy above the mean noise level were weighted according to the voicing-voiceless status of a frame where the glimpses were detected. Evaluated using speech signals processed to have thirteen glimpse compositions in both temporally stationary and fluctuating noise maskers, the linear correlation between model predictions and listeners' word recognition rates increased from 0.76 to 0.80 for weighted GP, and from 0.89 to 0.92 for weighted high-energy GP. Further taking the contribution from non-glimpsed regions into account in the model improved the correlation to 0.95, suggesting that intelligibility in noise can be better predicted when the contributions of different speech regions are finely modelled. |
---|---|
ISSN: | 0001-4966 1520-8524 |
DOI: | 10.1121/10.0018560 |