The Sound of Bounding-Boxes
In the task of audio-visual sound source separation, which leverages visual information for sound source separation, identifying objects in an image is a crucial step prior to separating the sound source. However, existing methods that assign sound on detected bounding boxes suffer from a problem th...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In the task of audio-visual sound source separation, which leverages visual
information for sound source separation, identifying objects in an image is a
crucial step prior to separating the sound source. However, existing methods
that assign sound on detected bounding boxes suffer from a problem that their
approach heavily relies on pre-trained object detectors. Specifically, when
using these existing methods, it is required to predetermine all the possible
categories of objects that can produce sound and use an object detector
applicable to all such categories. To tackle this problem, we propose a fully
unsupervised method that learns to detect objects in an image and separate
sound source simultaneously. As our method does not rely on any pre-trained
detector, our method is applicable to arbitrary categories without any
additional annotation. Furthermore, although being fully unsupervised, we found
that our method performs comparably in separation accuracy. |
---|---|
DOI: | 10.48550/arxiv.2203.15991 |