Interleaved Multitask Learning for Audio Source Separation with Independent Databases

Deep Neural Network-based source separation methods usually train independent models to optimize for the separation of individual sources. Although this can lead to good performance for well-defined targets, it can also be computationally expensive. The multitask alternative of a single network join...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Doire, Clement S. J, Okubadejo, Olumide
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Doire, Clement S. J
Okubadejo, Olumide
description Deep Neural Network-based source separation methods usually train independent models to optimize for the separation of individual sources. Although this can lead to good performance for well-defined targets, it can also be computationally expensive. The multitask alternative of a single network jointly optimizing for all targets simultaneously usually requires the availability of all target sources for each input. This requirement hampers the ability to create large training databases. In this paper, we present a model that decomposes the learnable parameters into a shared parametric model (encoder) and independent components (decoders) specific to each source. We propose an interleaved training procedure that optimizes the sub-task decoders independently and thus does not require each sample to possess a ground truth for all of its composing sources. Experimental results on MUSDB18 with the proposed method show comparable performance to independently trained models, with less trainable parameters, more efficient inference, and an encoder transferable to future target objectives. The results also show that using the proposed interleaved training procedure leads to better Source-to-Interference energy ratios when compared to the simultaneous optimization of all training objectives, even when all composing sources are available.
doi_str_mv 10.48550/arxiv.1908.05182
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1908_05182</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1908_05182</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-fd4102c8facb6b75bb3542aa6ae970f15a7dcb2fdc36fe0df9db68ee6dfc97793</originalsourceid><addsrcrecordid>eNotz71OwzAYhWEvDKhwAUz4BhIcJ7aTsSp_kVIxtMzRZ_tzaxGcyHEK3D1QWM67Hekh5KZgeVULwe4gfvpTXjSszpkoan5JXtuQMA4IJ7R0uwzJJ5jfaIcQgw8H6sZI14v1I92NSzRIdzhBhOTHQD98OtI2WJzwZ0Ki95BAw4zzFblwMMx4_d8V2T8-7DfPWffy1G7WXQZS8czZqmDc1A6MlloJrUtRcQAJ2CjmCgHKGs2dNaV0yKxrrJY1orTONEo15Yrc_t2eXf0U_TvEr_7X15995Te8Lk2P</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Interleaved Multitask Learning for Audio Source Separation with Independent Databases</title><source>arXiv.org</source><creator>Doire, Clement S. J ; Okubadejo, Olumide</creator><creatorcontrib>Doire, Clement S. J ; Okubadejo, Olumide</creatorcontrib><description>Deep Neural Network-based source separation methods usually train independent models to optimize for the separation of individual sources. Although this can lead to good performance for well-defined targets, it can also be computationally expensive. The multitask alternative of a single network jointly optimizing for all targets simultaneously usually requires the availability of all target sources for each input. This requirement hampers the ability to create large training databases. In this paper, we present a model that decomposes the learnable parameters into a shared parametric model (encoder) and independent components (decoders) specific to each source. We propose an interleaved training procedure that optimizes the sub-task decoders independently and thus does not require each sample to possess a ground truth for all of its composing sources. Experimental results on MUSDB18 with the proposed method show comparable performance to independently trained models, with less trainable parameters, more efficient inference, and an encoder transferable to future target objectives. The results also show that using the proposed interleaved training procedure leads to better Source-to-Interference energy ratios when compared to the simultaneous optimization of all training objectives, even when all composing sources are available.</description><identifier>DOI: 10.48550/arxiv.1908.05182</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Sound</subject><creationdate>2019-08</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1908.05182$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1908.05182$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Doire, Clement S. J</creatorcontrib><creatorcontrib>Okubadejo, Olumide</creatorcontrib><title>Interleaved Multitask Learning for Audio Source Separation with Independent Databases</title><description>Deep Neural Network-based source separation methods usually train independent models to optimize for the separation of individual sources. Although this can lead to good performance for well-defined targets, it can also be computationally expensive. The multitask alternative of a single network jointly optimizing for all targets simultaneously usually requires the availability of all target sources for each input. This requirement hampers the ability to create large training databases. In this paper, we present a model that decomposes the learnable parameters into a shared parametric model (encoder) and independent components (decoders) specific to each source. We propose an interleaved training procedure that optimizes the sub-task decoders independently and thus does not require each sample to possess a ground truth for all of its composing sources. Experimental results on MUSDB18 with the proposed method show comparable performance to independently trained models, with less trainable parameters, more efficient inference, and an encoder transferable to future target objectives. The results also show that using the proposed interleaved training procedure leads to better Source-to-Interference energy ratios when compared to the simultaneous optimization of all training objectives, even when all composing sources are available.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAYhWEvDKhwAUz4BhIcJ7aTsSp_kVIxtMzRZ_tzaxGcyHEK3D1QWM67Hekh5KZgeVULwe4gfvpTXjSszpkoan5JXtuQMA4IJ7R0uwzJJ5jfaIcQgw8H6sZI14v1I92NSzRIdzhBhOTHQD98OtI2WJzwZ0Ki95BAw4zzFblwMMx4_d8V2T8-7DfPWffy1G7WXQZS8czZqmDc1A6MlloJrUtRcQAJ2CjmCgHKGs2dNaV0yKxrrJY1orTONEo15Yrc_t2eXf0U_TvEr_7X15995Te8Lk2P</recordid><startdate>20190814</startdate><enddate>20190814</enddate><creator>Doire, Clement S. J</creator><creator>Okubadejo, Olumide</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20190814</creationdate><title>Interleaved Multitask Learning for Audio Source Separation with Independent Databases</title><author>Doire, Clement S. J ; Okubadejo, Olumide</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-fd4102c8facb6b75bb3542aa6ae970f15a7dcb2fdc36fe0df9db68ee6dfc97793</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Doire, Clement S. J</creatorcontrib><creatorcontrib>Okubadejo, Olumide</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Doire, Clement S. J</au><au>Okubadejo, Olumide</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Interleaved Multitask Learning for Audio Source Separation with Independent Databases</atitle><date>2019-08-14</date><risdate>2019</risdate><abstract>Deep Neural Network-based source separation methods usually train independent models to optimize for the separation of individual sources. Although this can lead to good performance for well-defined targets, it can also be computationally expensive. The multitask alternative of a single network jointly optimizing for all targets simultaneously usually requires the availability of all target sources for each input. This requirement hampers the ability to create large training databases. In this paper, we present a model that decomposes the learnable parameters into a shared parametric model (encoder) and independent components (decoders) specific to each source. We propose an interleaved training procedure that optimizes the sub-task decoders independently and thus does not require each sample to possess a ground truth for all of its composing sources. Experimental results on MUSDB18 with the proposed method show comparable performance to independently trained models, with less trainable parameters, more efficient inference, and an encoder transferable to future target objectives. The results also show that using the proposed interleaved training procedure leads to better Source-to-Interference energy ratios when compared to the simultaneous optimization of all training objectives, even when all composing sources are available.</abstract><doi>10.48550/arxiv.1908.05182</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1908.05182
ispartof
issn
language eng
recordid cdi_arxiv_primary_1908_05182
source arXiv.org
subjects Computer Science - Learning
Computer Science - Sound
title Interleaved Multitask Learning for Audio Source Separation with Independent Databases
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T03%3A36%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Interleaved%20Multitask%20Learning%20for%20Audio%20Source%20Separation%20with%20Independent%20Databases&rft.au=Doire,%20Clement%20S.%20J&rft.date=2019-08-14&rft_id=info:doi/10.48550/arxiv.1908.05182&rft_dat=%3Carxiv_GOX%3E1908_05182%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true