Receptive-Field Regularized CNNs for Music Classification and Tagging

Convolutional Neural Networks (CNNs) have been successfully used in various Music Information Retrieval (MIR) tasks, both as end-to-end models and as feature extractors for more complex systems. However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Koutini, Khaled, Eghbal-Zadeh, Hamid, Haunschmid, Verena, Primus, Paul, Chowdhury, Shreyan, Widmer, Gerhard
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Koutini, Khaled Eghbal-Zadeh, Hamid Haunschmid, Verena Primus, Paul Chowdhury, Shreyan Widmer, Gerhard
description	Convolutional Neural Networks (CNNs) have been successfully used in various Music Information Retrieval (MIR) tasks, both as end-to-end models and as feature extractors for more complex systems. However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on large datasets. Deeper models such as ResNet -- which surpassed VGG by a large margin in other domains -- are rarely used in MIR. One of the main reasons for this, as we will show, is the lack of generalization of deeper CNNs in the music domain. In this paper, we present a principled way to make deep architectures like ResNet competitive for music-related tasks, based on well-designed regularization strategies. In particular, we analyze the recently introduced Receptive-Field Regularization and Shake-Shake, and show that they significantly improve the generalization of deep CNNs on music-related tasks, and that the resulting deep CNNs can outperform current more complex models such as CNNs augmented with pre-training and attention. We demonstrate this on two different MIR tasks and two corresponding datasets, thus offering our deep regularized CNNs as a new baseline for these datasets, which can also be used as a feature-extracting module in future, more complex approaches.
doi_str_mv	10.48550/arxiv.2007.13503
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2007_13503</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2007_13503</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-abce51716ef390143f93ba60422a304666ed14ecfce2b287b27cda12ca129a4c3</originalsourceid><addsrcrecordid>eNotz71OwzAUhmEvDKj0ApjwDST1X-xmRFFbkEqRquzRiX0cHSmkld1WhasHCsOnd_ukh7FHKUqzrCqxgHSlS6mEcKXUldD3bLVHj8cTXbBYE46B73E4j5DoCwNvdrvM4yHxt3Mmz5sRcqZIHk50mDhMgbcwDDQND-wuwphx_t8Za9ertnkptu-b1-Z5W4B1uoDeYyWdtBh1LaTRsdY9WGGUAi2MtRaDNOijR9WrpeuV8wGk8j-rwXg9Y09_tzdHd0z0Aemz-_V0N4_-Bl0bRUU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Receptive-Field Regularized CNNs for Music Classification and Tagging</title><source>arXiv.org</source><creator>Koutini, Khaled ; Eghbal-Zadeh, Hamid ; Haunschmid, Verena ; Primus, Paul ; Chowdhury, Shreyan ; Widmer, Gerhard</creator><creatorcontrib>Koutini, Khaled ; Eghbal-Zadeh, Hamid ; Haunschmid, Verena ; Primus, Paul ; Chowdhury, Shreyan ; Widmer, Gerhard</creatorcontrib><description>Convolutional Neural Networks (CNNs) have been successfully used in various Music Information Retrieval (MIR) tasks, both as end-to-end models and as feature extractors for more complex systems. However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on large datasets. Deeper models such as ResNet -- which surpassed VGG by a large margin in other domains -- are rarely used in MIR. One of the main reasons for this, as we will show, is the lack of generalization of deeper CNNs in the music domain. In this paper, we present a principled way to make deep architectures like ResNet competitive for music-related tasks, based on well-designed regularization strategies. In particular, we analyze the recently introduced Receptive-Field Regularization and Shake-Shake, and show that they significantly improve the generalization of deep CNNs on music-related tasks, and that the resulting deep CNNs can outperform current more complex models such as CNNs augmented with pre-training and attention. We demonstrate this on two different MIR tasks and two corresponding datasets, thus offering our deep regularized CNNs as a new baseline for these datasets, which can also be used as a feature-extracting module in future, more complex approaches.</description><identifier>DOI: 10.48550/arxiv.2007.13503</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Sound</subject><creationdate>2020-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2007.13503$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2007.13503$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Koutini, Khaled</creatorcontrib><creatorcontrib>Eghbal-Zadeh, Hamid</creatorcontrib><creatorcontrib>Haunschmid, Verena</creatorcontrib><creatorcontrib>Primus, Paul</creatorcontrib><creatorcontrib>Chowdhury, Shreyan</creatorcontrib><creatorcontrib>Widmer, Gerhard</creatorcontrib><title>Receptive-Field Regularized CNNs for Music Classification and Tagging</title><description>Convolutional Neural Networks (CNNs) have been successfully used in various Music Information Retrieval (MIR) tasks, both as end-to-end models and as feature extractors for more complex systems. However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on large datasets. Deeper models such as ResNet -- which surpassed VGG by a large margin in other domains -- are rarely used in MIR. One of the main reasons for this, as we will show, is the lack of generalization of deeper CNNs in the music domain. In this paper, we present a principled way to make deep architectures like ResNet competitive for music-related tasks, based on well-designed regularization strategies. In particular, we analyze the recently introduced Receptive-Field Regularization and Shake-Shake, and show that they significantly improve the generalization of deep CNNs on music-related tasks, and that the resulting deep CNNs can outperform current more complex models such as CNNs augmented with pre-training and attention. We demonstrate this on two different MIR tasks and two corresponding datasets, thus offering our deep regularized CNNs as a new baseline for these datasets, which can also be used as a feature-extracting module in future, more complex approaches.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUhmEvDKj0ApjwDST1X-xmRFFbkEqRquzRiX0cHSmkld1WhasHCsOnd_ukh7FHKUqzrCqxgHSlS6mEcKXUldD3bLVHj8cTXbBYE46B73E4j5DoCwNvdrvM4yHxt3Mmz5sRcqZIHk50mDhMgbcwDDQND-wuwphx_t8Za9ertnkptu-b1-Z5W4B1uoDeYyWdtBh1LaTRsdY9WGGUAi2MtRaDNOijR9WrpeuV8wGk8j-rwXg9Y09_tzdHd0z0Aemz-_V0N4_-Bl0bRUU</recordid><startdate>20200727</startdate><enddate>20200727</enddate><creator>Koutini, Khaled</creator><creator>Eghbal-Zadeh, Hamid</creator><creator>Haunschmid, Verena</creator><creator>Primus, Paul</creator><creator>Chowdhury, Shreyan</creator><creator>Widmer, Gerhard</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200727</creationdate><title>Receptive-Field Regularized CNNs for Music Classification and Tagging</title><author>Koutini, Khaled ; Eghbal-Zadeh, Hamid ; Haunschmid, Verena ; Primus, Paul ; Chowdhury, Shreyan ; Widmer, Gerhard</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-abce51716ef390143f93ba60422a304666ed14ecfce2b287b27cda12ca129a4c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Koutini, Khaled</creatorcontrib><creatorcontrib>Eghbal-Zadeh, Hamid</creatorcontrib><creatorcontrib>Haunschmid, Verena</creatorcontrib><creatorcontrib>Primus, Paul</creatorcontrib><creatorcontrib>Chowdhury, Shreyan</creatorcontrib><creatorcontrib>Widmer, Gerhard</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Koutini, Khaled</au><au>Eghbal-Zadeh, Hamid</au><au>Haunschmid, Verena</au><au>Primus, Paul</au><au>Chowdhury, Shreyan</au><au>Widmer, Gerhard</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Receptive-Field Regularized CNNs for Music Classification and Tagging</atitle><date>2020-07-27</date><risdate>2020</risdate><abstract>Convolutional Neural Networks (CNNs) have been successfully used in various Music Information Retrieval (MIR) tasks, both as end-to-end models and as feature extractors for more complex systems. However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on large datasets. Deeper models such as ResNet -- which surpassed VGG by a large margin in other domains -- are rarely used in MIR. One of the main reasons for this, as we will show, is the lack of generalization of deeper CNNs in the music domain. In this paper, we present a principled way to make deep architectures like ResNet competitive for music-related tasks, based on well-designed regularization strategies. In particular, we analyze the recently introduced Receptive-Field Regularization and Shake-Shake, and show that they significantly improve the generalization of deep CNNs on music-related tasks, and that the resulting deep CNNs can outperform current more complex models such as CNNs augmented with pre-training and attention. We demonstrate this on two different MIR tasks and two corresponding datasets, thus offering our deep regularized CNNs as a new baseline for these datasets, which can also be used as a feature-extracting module in future, more complex approaches.</abstract><doi>10.48550/arxiv.2007.13503</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2007.13503
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2007_13503
source	arXiv.org
subjects	Computer Science - Learning Computer Science - Sound
title	Receptive-Field Regularized CNNs for Music Classification and Tagging
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T23%3A01%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Receptive-Field%20Regularized%20CNNs%20for%20Music%20Classification%20and%20Tagging&rft.au=Koutini,%20Khaled&rft.date=2020-07-27&rft_id=info:doi/10.48550/arxiv.2007.13503&rft_dat=%3Carxiv_GOX%3E2007_13503%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true