Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet
There have been significant advances in deep learning for music demixing in recent years. However, there has been little attention given to how these neural networks can be adapted for real-time low-latency applications, which could be helpful for hearing aids, remixing audio streams and live shows....
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Venkatesh, Satvik Benilov, Arthur Coleman, Philip Roskam, Frederic |
description | There have been significant advances in deep learning for music demixing in
recent years. However, there has been little attention given to how these
neural networks can be adapted for real-time low-latency applications, which
could be helpful for hearing aids, remixing audio streams and live shows. In
this paper, we investigate the various challenges involved in adapting current
demixing models in the literature for this use case. Subsequently, inspired by
the Hybrid Demucs architecture, we propose the Hybrid Spectrogram Time-domain
Audio Separation Network HS-TasNet, which utilises the advantages of spectral
and waveform domains. For a latency of 23 ms, the HS-TasNet obtains an overall
signal-to-distortion ratio (SDR) of 4.65 on the MusDB test set, and increases
to 5.55 with additional training data. These results demonstrate the potential
of efficient demixing for real-time low-latency music applications. |
doi_str_mv | 10.48550/arxiv.2402.17701 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_17701</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_17701</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-b1c622790135ed956f1397c6c31c6ef5b2dfdc36f2dee771095e1633a7ebff953</originalsourceid><addsrcrecordid>eNotz8tOwzAUBFBvWKCWD2CFf8DBj9rGy6oCipQWiWQf3TjXlaW85KRA_p5SWI00I410CLkXPNs8ac0fIX3Hz0xuuMyEtVzcksMHQsvm2CHNhy_Wwoy9X-jhPEVPi-GcPNICR0gwx6Gnl7o_0f1Sp9jQYkQ_p-GUoGMlTEec1-QmQDvh3X-uSPnyXO72LH9_fdttcwbGClYLb6S0jgulsXHaBKGc9cary4BB17IJjVcmyAbRWsGdRmGUAot1CE6rFXn4u716qjHFDtJS_bqqq0v9ADQ4SKY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet</title><source>arXiv.org</source><creator>Venkatesh, Satvik ; Benilov, Arthur ; Coleman, Philip ; Roskam, Frederic</creator><creatorcontrib>Venkatesh, Satvik ; Benilov, Arthur ; Coleman, Philip ; Roskam, Frederic</creatorcontrib><description>There have been significant advances in deep learning for music demixing in
recent years. However, there has been little attention given to how these
neural networks can be adapted for real-time low-latency applications, which
could be helpful for hearing aids, remixing audio streams and live shows. In
this paper, we investigate the various challenges involved in adapting current
demixing models in the literature for this use case. Subsequently, inspired by
the Hybrid Demucs architecture, we propose the Hybrid Spectrogram Time-domain
Audio Separation Network HS-TasNet, which utilises the advantages of spectral
and waveform domains. For a latency of 23 ms, the HS-TasNet obtains an overall
signal-to-distortion ratio (SDR) of 4.65 on the MusDB test set, and increases
to 5.55 with additional training data. These results demonstrate the potential
of efficient demixing for real-time low-latency music applications.</description><identifier>DOI: 10.48550/arxiv.2402.17701</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Sound</subject><creationdate>2024-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.17701$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.17701$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Venkatesh, Satvik</creatorcontrib><creatorcontrib>Benilov, Arthur</creatorcontrib><creatorcontrib>Coleman, Philip</creatorcontrib><creatorcontrib>Roskam, Frederic</creatorcontrib><title>Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet</title><description>There have been significant advances in deep learning for music demixing in
recent years. However, there has been little attention given to how these
neural networks can be adapted for real-time low-latency applications, which
could be helpful for hearing aids, remixing audio streams and live shows. In
this paper, we investigate the various challenges involved in adapting current
demixing models in the literature for this use case. Subsequently, inspired by
the Hybrid Demucs architecture, we propose the Hybrid Spectrogram Time-domain
Audio Separation Network HS-TasNet, which utilises the advantages of spectral
and waveform domains. For a latency of 23 ms, the HS-TasNet obtains an overall
signal-to-distortion ratio (SDR) of 4.65 on the MusDB test set, and increases
to 5.55 with additional training data. These results demonstrate the potential
of efficient demixing for real-time low-latency music applications.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8tOwzAUBFBvWKCWD2CFf8DBj9rGy6oCipQWiWQf3TjXlaW85KRA_p5SWI00I410CLkXPNs8ac0fIX3Hz0xuuMyEtVzcksMHQsvm2CHNhy_Wwoy9X-jhPEVPi-GcPNICR0gwx6Gnl7o_0f1Sp9jQYkQ_p-GUoGMlTEec1-QmQDvh3X-uSPnyXO72LH9_fdttcwbGClYLb6S0jgulsXHaBKGc9cary4BB17IJjVcmyAbRWsGdRmGUAot1CE6rFXn4u716qjHFDtJS_bqqq0v9ADQ4SKY</recordid><startdate>20240227</startdate><enddate>20240227</enddate><creator>Venkatesh, Satvik</creator><creator>Benilov, Arthur</creator><creator>Coleman, Philip</creator><creator>Roskam, Frederic</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240227</creationdate><title>Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet</title><author>Venkatesh, Satvik ; Benilov, Arthur ; Coleman, Philip ; Roskam, Frederic</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-b1c622790135ed956f1397c6c31c6ef5b2dfdc36f2dee771095e1633a7ebff953</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Venkatesh, Satvik</creatorcontrib><creatorcontrib>Benilov, Arthur</creatorcontrib><creatorcontrib>Coleman, Philip</creatorcontrib><creatorcontrib>Roskam, Frederic</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Venkatesh, Satvik</au><au>Benilov, Arthur</au><au>Coleman, Philip</au><au>Roskam, Frederic</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet</atitle><date>2024-02-27</date><risdate>2024</risdate><abstract>There have been significant advances in deep learning for music demixing in
recent years. However, there has been little attention given to how these
neural networks can be adapted for real-time low-latency applications, which
could be helpful for hearing aids, remixing audio streams and live shows. In
this paper, we investigate the various challenges involved in adapting current
demixing models in the literature for this use case. Subsequently, inspired by
the Hybrid Demucs architecture, we propose the Hybrid Spectrogram Time-domain
Audio Separation Network HS-TasNet, which utilises the advantages of spectral
and waveform domains. For a latency of 23 ms, the HS-TasNet obtains an overall
signal-to-distortion ratio (SDR) of 4.65 on the MusDB test set, and increases
to 5.55 with additional training data. These results demonstrate the potential
of efficient demixing for real-time low-latency music applications.</abstract><doi>10.48550/arxiv.2402.17701</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2402.17701 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2402_17701 |
source | arXiv.org |
subjects | Computer Science - Learning Computer Science - Sound |
title | Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T17%3A07%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Real-time%20Low-latency%20Music%20Source%20Separation%20using%20Hybrid%20Spectrogram-TasNet&rft.au=Venkatesh,%20Satvik&rft.date=2024-02-27&rft_id=info:doi/10.48550/arxiv.2402.17701&rft_dat=%3Carxiv_GOX%3E2402_17701%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |