Sparse periodicity‐based auditory features explain human performance in a spatial multitalker auditory scene analysis task

Human listeners robustly decode speech information from a talker of interest that is embedded in a mixture of spatially distributed interferers. A relevant question is which time‐frequency segments of the speech are predominantly used by a listener to solve such a complex Auditory Scene Analysis tas...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The European journal of neuroscience 2020-03, Vol.51 (5), p.1353-1363
Hauptverfasser: Josupeit, Angela, Schoenmaker, Esther, Par, Steven, Hohmann, Volker
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Human listeners robustly decode speech information from a talker of interest that is embedded in a mixture of spatially distributed interferers. A relevant question is which time‐frequency segments of the speech are predominantly used by a listener to solve such a complex Auditory Scene Analysis task. A recent psychoacoustic study investigated the relevance of low signal‐to‐noise ratio (SNR) components of a target signal on speech intelligibility in a spatial multitalker situation. For this, a three‐talker stimulus was manipulated in the spectro‐temporal domain such that target speech time‐frequency units below a variable SNR threshold (SNRcrit) were discarded while keeping the interferers unchanged. The psychoacoustic data indicate that only target components at and above a local SNR of about 0 dB contribute to intelligibility. This study applies an auditory scene analysis “glimpsing” model to the same manipulated stimuli. Model data are found to be similar to the human data, supporting the notion of “glimpsing,” that is, that salient speech‐related information is predominantly used by the auditory system to decode speech embedded in a mixture of sounds, at least for the tested conditions of three overlapping speech signals. This implies that perceptually relevant auditory information is sparse and may be processed with low computational effort, which is relevant for neurophysiological research of scene analysis and novelty processing in the auditory system. Comparing auditory “glimpses” (circles) derived from signal segments with salient temporal periodicity to clean‐speech templates (grey areas) yielded speech perception estimates similar to human performance in spatial conditions with three competing talkers. This implies that perceptually relevant information is sparse and may be processed with low computational effort. Findings are relevant for neurophysiological research of scene analysis and novelty processing in the auditory system.
ISSN:0953-816X
1460-9568
DOI:10.1111/ejn.13981