Modeling speech localization, identification, and word recognition in a multi-talker setting
In many everyday situations, listeners are confronted with complex acoustic scenes. Despite the complexity of these scenes, they are able to follow and understand one particular talker. This contribution presents auditory models that aim to solve speech-related tasks in multi-talker settings. The ma...
Gespeichert in:
Veröffentlicht in: | The Journal of the Acoustical Society of America 2017-05, Vol.141 (5), p.3693-3693 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In many everyday situations, listeners are confronted with complex acoustic scenes. Despite the complexity of these scenes, they are able to follow and understand one particular talker. This contribution presents auditory models that aim to solve speech-related tasks in multi-talker settings. The main characteristics of the models are: (1) restriction to salient auditory features (“glimpses”); (2) usage of periodicity, periodic energy, and binaural features; and (3) template-based classification methods using clean speech models. Further classification approaches using state-space models will be discussed. The model performance is evaluated on the basis of human psychoacoustic data [e.g., Brungart and Simpson, Perception & Psychophysics, 2007, 69(1), 79-91; Schoenmaker and van de Par, Physiology, Psychoacoustics and Cognition in Normal and Impaired Hearing, 2016, 73-81]. The model results were mostly found to be similar to the subject results. This suggests that sparse glimpses of periodicity-related monaural and binaural auditory features provide sufficient information about a complex auditory scene involving multiple talkers. Furthermore, it can be concluded that the usage of clean speech models is sufficient to decode speech information from the glimpses derived from a complex scene, i.e., computationally complex models of sound source superposition are not required for decoding a speech stream. |
---|---|
ISSN: | 0001-4966 1520-8524 |
DOI: | 10.1121/1.4988045 |