Attentive Tracking of Sound Sources

Auditory scenes often contain concurrent sound sources, but listeners are typically interested in just one of these and must somehow select it for further processing. One challenge is that real-world sounds such as speech vary over time and as a consequence often cannot be separated or selected base...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Current biology 2015-08, Vol.25 (17), p.2238-2246
Hauptverfasser:	Woods, Kevin J.P., McDermott, Josh H.
Format:	Artikel
Sprache:	eng
Schlagworte:	Adult Attention Cues Female Humans Male Perceptual Masking Sound Spectrography Speech Acoustics Speech Perception Young Adult
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Auditory scenes often contain concurrent sound sources, but listeners are typically interested in just one of these and must somehow select it for further processing. One challenge is that real-world sounds such as speech vary over time and as a consequence often cannot be separated or selected based on particular values of their features (e.g., high pitch). Here we show that human listeners can circumvent this challenge by tracking sounds with a movable focus of attention. We synthesized pairs of voices that changed in pitch and timbre over random, intertwined trajectories, lacking distinguishing features or linguistic information. Listeners were cued beforehand to attend to one of the voices. We measured their ability to extract this cued voice from the mixture by subsequently presenting the ending portion of one voice and asking whether it came from the cued voice. We found that listeners could perform this task but that performance was mediated by attention—listeners who performed best were also more sensitive to perturbations in the cued voice than in the uncued voice. Moreover, the task was impossible if the source trajectories did not maintain sufficient separation in feature space. The results suggest a locus of attention that can follow a sound’s trajectory through a feature space, likely aiding selection and segregation amid similar distractors. •Humans track sound sources through feature space with a movable focus of attention•Attentive tracking aids segregation of similar sound sources•Tracking failures occur if sound sources pass nearby in feature space•Tracking is robust to speech-like source discontinuities Hearing a sound source of interest amid other sources (the “cocktail party problem”) is difficult when sources are similar and change over time, as in speech. Woods and McDermott show that humans segregate sources in such situations using attentive tracking—employing a moving locus of attention to follow a sound as it changes over time.
ISSN:	0960-9822 1879-0445
DOI:	10.1016/j.cub.2015.07.043