Sound Source Separation Using Latent Variational Block-Wise Disentanglement
While neural network approaches have made significant strides in resolving classical signal processing problems, it is often the case that hybrid approaches that draw insight from both signal processing and neural networks produce more complete solutions. In this paper, we present a hybrid classical...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While neural network approaches have made significant strides in resolving
classical signal processing problems, it is often the case that hybrid
approaches that draw insight from both signal processing and neural networks
produce more complete solutions. In this paper, we present a hybrid classical
digital signal processing/deep neural network (DSP/DNN) approach to source
separation (SS) highlighting the theoretical link between variational
autoencoder and classical approaches to SS. We propose a system that transforms
the single channel under-determined SS task to an equivalent multichannel
over-determined SS problem in a properly designed latent space. The separation
task in the latent space is treated as finding a variational block-wise
disentangled representation of the mixture. We show empirically, that the
design choices and the variational formulation of the task at hand motivated by
the classical signal processing theoretical results lead to robustness to
unseen out-of-distribution data and reduction of the overfitting risk. To
address the resulting permutation issue we explicitly incorporate a novel
differentiable permutation loss function and augment the model with a memory
mechanism to keep track of the statistics of the individual sources. |
---|---|
DOI: | 10.48550/arxiv.2402.06683 |