Jointly Detecting and Separating Singing Voice: A Multi-Task Approach
A main challenge in applying deep learning to music processing is the availability of training data. One potential solution is Multi-task Learning, in which the model also learns to solve related auxiliary tasks on additional datasets to exploit their correlation. While intuitive in principle, it ca...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A main challenge in applying deep learning to music processing is the
availability of training data. One potential solution is Multi-task Learning,
in which the model also learns to solve related auxiliary tasks on additional
datasets to exploit their correlation. While intuitive in principle, it can be
challenging to identify related tasks and construct the model to optimally
share information between tasks. In this paper, we explore vocal activity
detection as an additional task to stabilise and improve the performance of
vocal separation. Further, we identify problematic biases specific to each
dataset that could limit the generalisation capability of separation and
detection models, to which our proposed approach is robust. Experiments show
improved performance in separation as well as vocal detection compared to
single-task baselines. However, we find that the commonly used
Signal-to-Distortion Ratio (SDR) metrics did not capture the improvement on
non-vocal sections, indicating the need for improved evaluation methodologies. |
---|---|
DOI: | 10.48550/arxiv.1804.01650 |