DcaseNet: An integrated pretrained deep neural network for detecting and classifying acoustic scenes and events
Although acoustic scenes and events include many related tasks, their combined detection and classification have been scarcely investigated. We propose three architectures of deep neural networks that are integrated to simultaneously perform acoustic scene classification, audio tagging, and sound ev...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Although acoustic scenes and events include many related tasks, their
combined detection and classification have been scarcely investigated. We
propose three architectures of deep neural networks that are integrated to
simultaneously perform acoustic scene classification, audio tagging, and sound
event detection. The first two architectures are inspired by human cognitive
processes. The first architecture resembles the short-term perception for scene
classification of adults, who can detect various sound events that are then
used to identify the acoustic scene. The second architecture resembles the
long-term learning of babies, being also the concept underlying self-supervised
learning. Babies first observe the effects of abstract notions such as gravity
and then learn specific tasks using such perceptions. The third architecture
adds a few layers to the second one that solely perform a single task before
its corresponding output layer. We aim to build an integrated system that can
serve as a pretrained model to perform the three abovementioned tasks.
Experiments on three datasets demonstrate that the proposed architecture,
called DcaseNet, can be either directly used for any of the tasks while
providing suitable results or fine-tuned to improve the performance of one
task. The code and pretrained DcaseNet weights are available at
https://github.com/Jungjee/DcaseNet. |
---|---|
DOI: | 10.48550/arxiv.2009.09642 |