Echoic log-surprise: A multi-scale scheme for acoustic saliency detection
•A new model to compute acoustic saliency based on Bayesian log-surprise is proposed.•Our model implements a multi-scale scheme, inspired by Acoustic Sensory Memory.•We validated our proposal using Acoustic Event Detection datasets.•Performance compared favorably against state-of-the-art saliency al...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2018-12, Vol.114, p.255-266 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A new model to compute acoustic saliency based on Bayesian log-surprise is proposed.•Our model implements a multi-scale scheme, inspired by Acoustic Sensory Memory.•We validated our proposal using Acoustic Event Detection datasets.•Performance compared favorably against state-of-the-art saliency algorithms.
Perceptual signals such as acoustic or visual cues carry a massive amount of information. From a human perspective, this problem is solved by means of cognitive mechanisms related to attention. In particular, saliency is a property of particular stimuli that makes them stand from others to allow the brain to take decisions about their relevance in the process of exploring the world.
For artificial intelligence systems it is advantageous to mimic these mechanisms. Visual saliency algorithms have been successfully employed in tasks such as medical diagnosis, detection of violent scenes, environment understanding made by robots, etc. In contrast, computational models of the acoustic saliency mechanisms are less extended. In this context, we propose a novel acoustic saliency algorithm to be used by intelligent and expert systems facing tasks such as sound detection and classification, early alarm, surveillance, robotic exploration of the surroundings, among many other applications.
This technique, we termed echoic log-surprise, combines an unsupervised statistical approach based on Bayesian log-surprise and the biological concept of echoic or Auditory Sensory Memory. Our algorithm computes several independent log-surprise cues in parallel considering a wide range of memory values, with the aim of leveraging saliency information from different temporal scales. Then, we explore several statistical metrics to combine these multi-scale signals in a single temporal saliency signal including Renyi entropy, Jensen-Shannon divergence, Cramer or Bhattacharyya distances. We have adopted Acoustic Event Detection tasks as adequate proxies to test its performance. Results show that the proposed echoic log-surprise method outperforms classical acoustic detection techniques commonly deployed in intelligent and expert systems, such as energy thresholding or voice activity detection, and it also achieves better results than some other state-of-the-art acoustic saliency algorithms, such as Kalinli’s and conventional log-surprise. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2018.07.018 |