Toward Fully Self-Supervised Multi-Pitch Estimation

Multi-pitch estimation is a decades-long research problem involving the detection of pitch activity associated with concurrent musical events within multi-instrument mixtures. Supervised learning techniques have demonstrated solid performance on more narrow characterizations of the task, but suffer...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-02
Hauptverfasser:	Cwitkowitz, Frank, Duan, Zhiyao
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Datasets Geometric transformation Mixtures Music Pitch estimation Self-supervised learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Cwitkowitz, Frank Duan, Zhiyao
description	Multi-pitch estimation is a decades-long research problem involving the detection of pitch activity associated with concurrent musical events within multi-instrument mixtures. Supervised learning techniques have demonstrated solid performance on more narrow characterizations of the task, but suffer from limitations concerning the shortage of large-scale and diverse polyphonic music datasets with multi-pitch annotations. We present a suite of self-supervised learning objectives for multi-pitch estimation, which encourage the concentration of support around harmonics, invariance to timbral transformations, and equivariance to geometric transformations. These objectives are sufficient to train an entirely convolutional autoencoder to produce multi-pitch salience-grams directly, without any fine-tuning. Despite training exclusively on a collection of synthetic single-note audio samples, our fully self-supervised framework generalizes to polyphonic music mixtures, and achieves performance comparable to supervised models trained on conventional multi-pitch datasets.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2932315487</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2932315487</sourcerecordid><originalsourceid>FETCH-proquest_journals_29323154873</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQwDskvTyxKUXArzcmpVAhOzUnTDS4tSC0qyyxOTVHwLc0pydQNyCxJzlBwLS7JzE0syczP42FgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCNLYyNjQ1MTC3Nj4lQBABUXNDs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2932315487</pqid></control><display><type>article</type><title>Toward Fully Self-Supervised Multi-Pitch Estimation</title><source>Free E- Journals</source><creator>Cwitkowitz, Frank ; Duan, Zhiyao</creator><creatorcontrib>Cwitkowitz, Frank ; Duan, Zhiyao</creatorcontrib><description>Multi-pitch estimation is a decades-long research problem involving the detection of pitch activity associated with concurrent musical events within multi-instrument mixtures. Supervised learning techniques have demonstrated solid performance on more narrow characterizations of the task, but suffer from limitations concerning the shortage of large-scale and diverse polyphonic music datasets with multi-pitch annotations. We present a suite of self-supervised learning objectives for multi-pitch estimation, which encourage the concentration of support around harmonics, invariance to timbral transformations, and equivariance to geometric transformations. These objectives are sufficient to train an entirely convolutional autoencoder to produce multi-pitch salience-grams directly, without any fine-tuning. Despite training exclusively on a collection of synthetic single-note audio samples, our fully self-supervised framework generalizes to polyphonic music mixtures, and achieves performance comparable to supervised models trained on conventional multi-pitch datasets.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Annotations ; Datasets ; Geometric transformation ; Mixtures ; Music ; Pitch estimation ; Self-supervised learning</subject><ispartof>arXiv.org, 2024-02</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Cwitkowitz, Frank</creatorcontrib><creatorcontrib>Duan, Zhiyao</creatorcontrib><title>Toward Fully Self-Supervised Multi-Pitch Estimation</title><title>arXiv.org</title><description>Multi-pitch estimation is a decades-long research problem involving the detection of pitch activity associated with concurrent musical events within multi-instrument mixtures. Supervised learning techniques have demonstrated solid performance on more narrow characterizations of the task, but suffer from limitations concerning the shortage of large-scale and diverse polyphonic music datasets with multi-pitch annotations. We present a suite of self-supervised learning objectives for multi-pitch estimation, which encourage the concentration of support around harmonics, invariance to timbral transformations, and equivariance to geometric transformations. These objectives are sufficient to train an entirely convolutional autoencoder to produce multi-pitch salience-grams directly, without any fine-tuning. Despite training exclusively on a collection of synthetic single-note audio samples, our fully self-supervised framework generalizes to polyphonic music mixtures, and achieves performance comparable to supervised models trained on conventional multi-pitch datasets.</description><subject>Annotations</subject><subject>Datasets</subject><subject>Geometric transformation</subject><subject>Mixtures</subject><subject>Music</subject><subject>Pitch estimation</subject><subject>Self-supervised learning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQwDskvTyxKUXArzcmpVAhOzUnTDS4tSC0qyyxOTVHwLc0pydQNyCxJzlBwLS7JzE0syczP42FgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCNLYyNjQ1MTC3Nj4lQBABUXNDs</recordid><startdate>20240223</startdate><enddate>20240223</enddate><creator>Cwitkowitz, Frank</creator><creator>Duan, Zhiyao</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240223</creationdate><title>Toward Fully Self-Supervised Multi-Pitch Estimation</title><author>Cwitkowitz, Frank ; Duan, Zhiyao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29323154873</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Annotations</topic><topic>Datasets</topic><topic>Geometric transformation</topic><topic>Mixtures</topic><topic>Music</topic><topic>Pitch estimation</topic><topic>Self-supervised learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Cwitkowitz, Frank</creatorcontrib><creatorcontrib>Duan, Zhiyao</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cwitkowitz, Frank</au><au>Duan, Zhiyao</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Toward Fully Self-Supervised Multi-Pitch Estimation</atitle><jtitle>arXiv.org</jtitle><date>2024-02-23</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Multi-pitch estimation is a decades-long research problem involving the detection of pitch activity associated with concurrent musical events within multi-instrument mixtures. Supervised learning techniques have demonstrated solid performance on more narrow characterizations of the task, but suffer from limitations concerning the shortage of large-scale and diverse polyphonic music datasets with multi-pitch annotations. We present a suite of self-supervised learning objectives for multi-pitch estimation, which encourage the concentration of support around harmonics, invariance to timbral transformations, and equivariance to geometric transformations. These objectives are sufficient to train an entirely convolutional autoencoder to produce multi-pitch salience-grams directly, without any fine-tuning. Despite training exclusively on a collection of synthetic single-note audio samples, our fully self-supervised framework generalizes to polyphonic music mixtures, and achieves performance comparable to supervised models trained on conventional multi-pitch datasets.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-02
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2932315487
source	Free E- Journals
subjects	Annotations Datasets Geometric transformation Mixtures Music Pitch estimation Self-supervised learning
title	Toward Fully Self-Supervised Multi-Pitch Estimation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T05%3A14%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Toward%20Fully%20Self-Supervised%20Multi-Pitch%20Estimation&rft.jtitle=arXiv.org&rft.au=Cwitkowitz,%20Frank&rft.date=2024-02-23&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2932315487%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2932315487&rft_id=info:pmid/&rfr_iscdi=true