Robust Downbeat Tracking Using an Ensemble of Convolutional Networks

In this paper, we present a novel state of the art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm and bass con...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Durand, S, Bello, J. P, David, B, Richard, G
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Durand, S
Bello, J. P
David, B
Richard, G
description In this paper, we present a novel state of the art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm and bass content to feed convolutional neural networks that are adapted to take advantage of each feature characteristics. This ensemble of neural networks is combined to obtain one downbeat likelihood per tatum. The downbeat sequence is finally decoded with a flexible and efficient temporal model which takes advantage of the metrical continuity of a song. We then perform an evaluation of our system on a large base of 9 datasets, compare its performance to 4 other published algorithms and obtain a significant increase of 16.8 percent points compared to the second best system, for altogether a moderate cost in test and training. The influence of each step of the method is studied to show its strengths and shortcomings.
doi_str_mv 10.48550/arxiv.1605.08396
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1605_08396</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1605_08396</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-4e748326f633ec4f9147c3d6658ab054e781ce7d470650407195381a83314cca3</originalsourceid><addsrcrecordid>eNotz81OwzAQBGBfOKDCA3DCL5Bgs_baOVZp-ZGqIqFwjjauU0VNbWSnLbw9tHCZOYw00sfYnRSlslqLB0pfw7GUKHQpLFR4zRbvsTvkiS_iKXSeJt4kcrshbPlHPicFvgzZ77vR89jzOoZjHA_TEAONfO2nU0y7fMOuehqzv_3vGWuelk39Uqzenl_r-aogNFgob5SFR-wRwDvVV1IZBxtEbakT-ne20nmzUUagFkoYWWmwkiyAVM4RzNj93-2F0X6mYU_puz1z2gsHfgAU00Qa</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Robust Downbeat Tracking Using an Ensemble of Convolutional Networks</title><source>arXiv.org</source><creator>Durand, S ; Bello, J. P ; David, B ; Richard, G</creator><creatorcontrib>Durand, S ; Bello, J. P ; David, B ; Richard, G</creatorcontrib><description>In this paper, we present a novel state of the art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm and bass content to feed convolutional neural networks that are adapted to take advantage of each feature characteristics. This ensemble of neural networks is combined to obtain one downbeat likelihood per tatum. The downbeat sequence is finally decoded with a flexible and efficient temporal model which takes advantage of the metrical continuity of a song. We then perform an evaluation of our system on a large base of 9 datasets, compare its performance to 4 other published algorithms and obtain a significant increase of 16.8 percent points compared to the second best system, for altogether a moderate cost in test and training. The influence of each step of the method is studied to show its strengths and shortcomings.</description><identifier>DOI: 10.48550/arxiv.1605.08396</identifier><language>eng</language><subject>Computer Science - Neural and Evolutionary Computing ; Computer Science - Sound</subject><creationdate>2016-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1605.08396$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1605.08396$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Durand, S</creatorcontrib><creatorcontrib>Bello, J. P</creatorcontrib><creatorcontrib>David, B</creatorcontrib><creatorcontrib>Richard, G</creatorcontrib><title>Robust Downbeat Tracking Using an Ensemble of Convolutional Networks</title><description>In this paper, we present a novel state of the art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm and bass content to feed convolutional neural networks that are adapted to take advantage of each feature characteristics. This ensemble of neural networks is combined to obtain one downbeat likelihood per tatum. The downbeat sequence is finally decoded with a flexible and efficient temporal model which takes advantage of the metrical continuity of a song. We then perform an evaluation of our system on a large base of 9 datasets, compare its performance to 4 other published algorithms and obtain a significant increase of 16.8 percent points compared to the second best system, for altogether a moderate cost in test and training. The influence of each step of the method is studied to show its strengths and shortcomings.</description><subject>Computer Science - Neural and Evolutionary Computing</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81OwzAQBGBfOKDCA3DCL5Bgs_baOVZp-ZGqIqFwjjauU0VNbWSnLbw9tHCZOYw00sfYnRSlslqLB0pfw7GUKHQpLFR4zRbvsTvkiS_iKXSeJt4kcrshbPlHPicFvgzZ77vR89jzOoZjHA_TEAONfO2nU0y7fMOuehqzv_3vGWuelk39Uqzenl_r-aogNFgob5SFR-wRwDvVV1IZBxtEbakT-ne20nmzUUagFkoYWWmwkiyAVM4RzNj93-2F0X6mYU_puz1z2gsHfgAU00Qa</recordid><startdate>20160526</startdate><enddate>20160526</enddate><creator>Durand, S</creator><creator>Bello, J. P</creator><creator>David, B</creator><creator>Richard, G</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20160526</creationdate><title>Robust Downbeat Tracking Using an Ensemble of Convolutional Networks</title><author>Durand, S ; Bello, J. P ; David, B ; Richard, G</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-4e748326f633ec4f9147c3d6658ab054e781ce7d470650407195381a83314cca3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Computer Science - Neural and Evolutionary Computing</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Durand, S</creatorcontrib><creatorcontrib>Bello, J. P</creatorcontrib><creatorcontrib>David, B</creatorcontrib><creatorcontrib>Richard, G</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Durand, S</au><au>Bello, J. P</au><au>David, B</au><au>Richard, G</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Downbeat Tracking Using an Ensemble of Convolutional Networks</atitle><date>2016-05-26</date><risdate>2016</risdate><abstract>In this paper, we present a novel state of the art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm and bass content to feed convolutional neural networks that are adapted to take advantage of each feature characteristics. This ensemble of neural networks is combined to obtain one downbeat likelihood per tatum. The downbeat sequence is finally decoded with a flexible and efficient temporal model which takes advantage of the metrical continuity of a song. We then perform an evaluation of our system on a large base of 9 datasets, compare its performance to 4 other published algorithms and obtain a significant increase of 16.8 percent points compared to the second best system, for altogether a moderate cost in test and training. The influence of each step of the method is studied to show its strengths and shortcomings.</abstract><doi>10.48550/arxiv.1605.08396</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1605.08396
ispartof
issn
language eng
recordid cdi_arxiv_primary_1605_08396
source arXiv.org
subjects Computer Science - Neural and Evolutionary Computing
Computer Science - Sound
title Robust Downbeat Tracking Using an Ensemble of Convolutional Networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T11%3A04%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Downbeat%20Tracking%20Using%20an%20Ensemble%20of%20Convolutional%20Networks&rft.au=Durand,%20S&rft.date=2016-05-26&rft_id=info:doi/10.48550/arxiv.1605.08396&rft_dat=%3Carxiv_GOX%3E1605_08396%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true