Robust Downbeat Tracking Using an Ensemble of Convolutional Networks

In this paper, we present a novel state of the art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm and bass con...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Durand, S, Bello, J. P, David, B, Richard, G
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Neural and Evolutionary Computing Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Durand, S Bello, J. P David, B Richard, G
description	In this paper, we present a novel state of the art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm and bass content to feed convolutional neural networks that are adapted to take advantage of each feature characteristics. This ensemble of neural networks is combined to obtain one downbeat likelihood per tatum. The downbeat sequence is finally decoded with a flexible and efficient temporal model which takes advantage of the metrical continuity of a song. We then perform an evaluation of our system on a large base of 9 datasets, compare its performance to 4 other published algorithms and obtain a significant increase of 16.8 percent points compared to the second best system, for altogether a moderate cost in test and training. The influence of each step of the method is studied to show its strengths and shortcomings.
doi_str_mv	10.48550/arxiv.1605.08396
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1605_08396</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1605_08396</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-4e748326f633ec4f9147c3d6658ab054e781ce7d470650407195381a83314cca3</originalsourceid><addsrcrecordid>eNotz81OwzAQBGBfOKDCA3DCL5Bgs_baOVZp-ZGqIqFwjjauU0VNbWSnLbw9tHCZOYw00sfYnRSlslqLB0pfw7GUKHQpLFR4zRbvsTvkiS_iKXSeJt4kcrshbPlHPicFvgzZ77vR89jzOoZjHA_TEAONfO2nU0y7fMOuehqzv_3vGWuelk39Uqzenl_r-aogNFgob5SFR-wRwDvVV1IZBxtEbakT-ne20nmzUUagFkoYWWmwkiyAVM4RzNj93-2F0X6mYU_puz1z2gsHfgAU00Qa</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Robust Downbeat Tracking Using an Ensemble of Convolutional Networks</title><source>arXiv.org</source><creator>Durand, S ; Bello, J. P ; David, B ; Richard, G</creator><creatorcontrib>Durand, S ; Bello, J. P ; David, B ; Richard, G</creatorcontrib><description>In this paper, we present a novel state of the art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm and bass content to feed convolutional neural networks that are adapted to take advantage of each feature characteristics. This ensemble of neural networks is combined to obtain one downbeat likelihood per tatum. The downbeat sequence is finally decoded with a flexible and efficient temporal model which takes advantage of the metrical continuity of a song. We then perform an evaluation of our system on a large base of 9 datasets, compare its performance to 4 other published algorithms and obtain a significant increase of 16.8 percent points compared to the second best system, for altogether a moderate cost in test and training. The influence of each step of the method is studied to show its strengths and shortcomings.</description><identifier>DOI: 10.48550/arxiv.1605.08396</identifier><language>eng</language><subject>Computer Science - Neural and Evolutionary Computing ; Computer Science - Sound</subject><creationdate>2016-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1605.08396$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1605.08396$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Durand, S</creatorcontrib><creatorcontrib>Bello, J. P</creatorcontrib><creatorcontrib>David, B</creatorcontrib><creatorcontrib>Richard, G</creatorcontrib><title>Robust Downbeat Tracking Using an Ensemble of Convolutional Networks</title><description>In this paper, we present a novel state of the art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm and bass content to feed convolutional neural networks that are adapted to take advantage of each feature characteristics. This ensemble of neural networks is combined to obtain one downbeat likelihood per tatum. The downbeat sequence is finally decoded with a flexible and efficient temporal model which takes advantage of the metrical continuity of a song. We then perform an evaluation of our system on a large base of 9 datasets, compare its performance to 4 other published algorithms and obtain a significant increase of 16.8 percent points compared to the second best system, for altogether a moderate cost in test and training. The influence of each step of the method is studied to show its strengths and shortcomings.</description><subject>Computer Science - Neural and Evolutionary Computing</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81OwzAQBGBfOKDCA3DCL5Bgs_baOVZp-ZGqIqFwjjauU0VNbWSnLbw9tHCZOYw00sfYnRSlslqLB0pfw7GUKHQpLFR4zRbvsTvkiS_iKXSeJt4kcrshbPlHPicFvgzZ77vR89jzOoZjHA_TEAONfO2nU0y7fMOuehqzv_3vGWuelk39Uqzenl_r-aogNFgob5SFR-wRwDvVV1IZBxtEbakT-ne20nmzUUagFkoYWWmwkiyAVM4RzNj93-2F0X6mYU_puz1z2gsHfgAU00Qa</recordid><startdate>20160526</startdate><enddate>20160526</enddate><creator>Durand, S</creator><creator>Bello, J. P</creator><creator>David, B</creator><creator>Richard, G</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20160526</creationdate><title>Robust Downbeat Tracking Using an Ensemble of Convolutional Networks</title><author>Durand, S ; Bello, J. P ; David, B ; Richard, G</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-4e748326f633ec4f9147c3d6658ab054e781ce7d470650407195381a83314cca3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Computer Science - Neural and Evolutionary Computing</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Durand, S</creatorcontrib><creatorcontrib>Bello, J. P</creatorcontrib><creatorcontrib>David, B</creatorcontrib><creatorcontrib>Richard, G</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Durand, S</au><au>Bello, J. P</au><au>David, B</au><au>Richard, G</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Downbeat Tracking Using an Ensemble of Convolutional Networks</atitle><date>2016-05-26</date><risdate>2016</risdate><abstract>In this paper, we present a novel state of the art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm and bass content to feed convolutional neural networks that are adapted to take advantage of each feature characteristics. This ensemble of neural networks is combined to obtain one downbeat likelihood per tatum. The downbeat sequence is finally decoded with a flexible and efficient temporal model which takes advantage of the metrical continuity of a song. We then perform an evaluation of our system on a large base of 9 datasets, compare its performance to 4 other published algorithms and obtain a significant increase of 16.8 percent points compared to the second best system, for altogether a moderate cost in test and training. The influence of each step of the method is studied to show its strengths and shortcomings.</abstract><doi>10.48550/arxiv.1605.08396</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1605.08396
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1605_08396
source	arXiv.org
subjects	Computer Science - Neural and Evolutionary Computing Computer Science - Sound
title	Robust Downbeat Tracking Using an Ensemble of Convolutional Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T11%3A04%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Downbeat%20Tracking%20Using%20an%20Ensemble%20of%20Convolutional%20Networks&rft.au=Durand,%20S&rft.date=2016-05-26&rft_id=info:doi/10.48550/arxiv.1605.08396&rft_dat=%3Carxiv_GOX%3E1605_08396%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true