Self-supervised Auxiliary Loss for Metric Learning in Music Similarity-based Retrieval and Auto-tagging

In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from alternative sources to enhance their performance. Self-supervise...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Akama, Taketo, Kitano, Hiroaki, Takematsu, Katsuhiro, Miyajima, Yasushi, Polouliakh, Natalia
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Akama, Taketo
Kitano, Hiroaki
Takematsu, Katsuhiro
Miyajima, Yasushi
Polouliakh, Natalia
description In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from alternative sources to enhance their performance. Self-supervised learning, which exclusively relies on learning signals derived from music audio data, has demonstrated its efficacy in the context of auto-tagging. In this study, we propose a model that builds on the self-supervised learning approach to address the similarity-based retrieval challenge by introducing our method of metric learning with a self-supervised auxiliary loss. Furthermore, diverging from conventional self-supervised learning methodologies, we discovered the advantages of concurrently training the model with both self-supervision and supervision signals, without freezing pre-trained models. We also found that refraining from employing augmentation during the fine-tuning phase yields better results. Our experimental results confirm that the proposed methodology enhances retrieval and tagging performance metrics in two distinct scenarios: one where human-annotated tags are consistently available for all music tracks, and another where such tags are accessible only for a subset of tracks.
doi_str_mv 10.48550/arxiv.2304.07449
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2304_07449</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2304_07449</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-166cfb8b72dc40dd6ea175819a935609c377cfd0802445a5d5dc19e87d2a17583</originalsourceid><addsrcrecordid>eNotj82KgzAUhbPpYmjnAWY1eYE4URNjlqXMH1gK0-7lmkS5YLUkKu3bjzqzOnA434GPkJeYRyKXkr-Bv-MUJSkXEVdC6CfSnF1bszDenJ8wOEv34x1bBP-gRR8CrXtPj27waGjhwHfYNRQ7ehzD3Jzxii14HB6sggX-WZZugpZCt1wNPRugaWZoRzY1tME9_-eWXD7eL4cvVpw-vw_7gkGmNIuzzNRVXqnEGsGtzRzESuaxBp3KjGuTKmVqy3OeCCFBWmlNrF2ubLIO0y15_btdTcubx-usUi7G5Wqc_gKb-lHt</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Self-supervised Auxiliary Loss for Metric Learning in Music Similarity-based Retrieval and Auto-tagging</title><source>arXiv.org</source><creator>Akama, Taketo ; Kitano, Hiroaki ; Takematsu, Katsuhiro ; Miyajima, Yasushi ; Polouliakh, Natalia</creator><creatorcontrib>Akama, Taketo ; Kitano, Hiroaki ; Takematsu, Katsuhiro ; Miyajima, Yasushi ; Polouliakh, Natalia</creatorcontrib><description>In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from alternative sources to enhance their performance. Self-supervised learning, which exclusively relies on learning signals derived from music audio data, has demonstrated its efficacy in the context of auto-tagging. In this study, we propose a model that builds on the self-supervised learning approach to address the similarity-based retrieval challenge by introducing our method of metric learning with a self-supervised auxiliary loss. Furthermore, diverging from conventional self-supervised learning methodologies, we discovered the advantages of concurrently training the model with both self-supervision and supervision signals, without freezing pre-trained models. We also found that refraining from employing augmentation during the fine-tuning phase yields better results. Our experimental results confirm that the proposed methodology enhances retrieval and tagging performance metrics in two distinct scenarios: one where human-annotated tags are consistently available for all music tracks, and another where such tags are accessible only for a subset of tracks.</description><identifier>DOI: 10.48550/arxiv.2304.07449</identifier><language>eng</language><subject>Computer Science - Information Retrieval ; Computer Science - Learning ; Computer Science - Sound</subject><creationdate>2023-04</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2304.07449$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2304.07449$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Akama, Taketo</creatorcontrib><creatorcontrib>Kitano, Hiroaki</creatorcontrib><creatorcontrib>Takematsu, Katsuhiro</creatorcontrib><creatorcontrib>Miyajima, Yasushi</creatorcontrib><creatorcontrib>Polouliakh, Natalia</creatorcontrib><title>Self-supervised Auxiliary Loss for Metric Learning in Music Similarity-based Retrieval and Auto-tagging</title><description>In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from alternative sources to enhance their performance. Self-supervised learning, which exclusively relies on learning signals derived from music audio data, has demonstrated its efficacy in the context of auto-tagging. In this study, we propose a model that builds on the self-supervised learning approach to address the similarity-based retrieval challenge by introducing our method of metric learning with a self-supervised auxiliary loss. Furthermore, diverging from conventional self-supervised learning methodologies, we discovered the advantages of concurrently training the model with both self-supervision and supervision signals, without freezing pre-trained models. We also found that refraining from employing augmentation during the fine-tuning phase yields better results. Our experimental results confirm that the proposed methodology enhances retrieval and tagging performance metrics in two distinct scenarios: one where human-annotated tags are consistently available for all music tracks, and another where such tags are accessible only for a subset of tracks.</description><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj82KgzAUhbPpYmjnAWY1eYE4URNjlqXMH1gK0-7lmkS5YLUkKu3bjzqzOnA434GPkJeYRyKXkr-Bv-MUJSkXEVdC6CfSnF1bszDenJ8wOEv34x1bBP-gRR8CrXtPj27waGjhwHfYNRQ7ehzD3Jzxii14HB6sggX-WZZugpZCt1wNPRugaWZoRzY1tME9_-eWXD7eL4cvVpw-vw_7gkGmNIuzzNRVXqnEGsGtzRzESuaxBp3KjGuTKmVqy3OeCCFBWmlNrF2ubLIO0y15_btdTcubx-usUi7G5Wqc_gKb-lHt</recordid><startdate>20230414</startdate><enddate>20230414</enddate><creator>Akama, Taketo</creator><creator>Kitano, Hiroaki</creator><creator>Takematsu, Katsuhiro</creator><creator>Miyajima, Yasushi</creator><creator>Polouliakh, Natalia</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230414</creationdate><title>Self-supervised Auxiliary Loss for Metric Learning in Music Similarity-based Retrieval and Auto-tagging</title><author>Akama, Taketo ; Kitano, Hiroaki ; Takematsu, Katsuhiro ; Miyajima, Yasushi ; Polouliakh, Natalia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-166cfb8b72dc40dd6ea175819a935609c377cfd0802445a5d5dc19e87d2a17583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Akama, Taketo</creatorcontrib><creatorcontrib>Kitano, Hiroaki</creatorcontrib><creatorcontrib>Takematsu, Katsuhiro</creatorcontrib><creatorcontrib>Miyajima, Yasushi</creatorcontrib><creatorcontrib>Polouliakh, Natalia</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Akama, Taketo</au><au>Kitano, Hiroaki</au><au>Takematsu, Katsuhiro</au><au>Miyajima, Yasushi</au><au>Polouliakh, Natalia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Self-supervised Auxiliary Loss for Metric Learning in Music Similarity-based Retrieval and Auto-tagging</atitle><date>2023-04-14</date><risdate>2023</risdate><abstract>In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from alternative sources to enhance their performance. Self-supervised learning, which exclusively relies on learning signals derived from music audio data, has demonstrated its efficacy in the context of auto-tagging. In this study, we propose a model that builds on the self-supervised learning approach to address the similarity-based retrieval challenge by introducing our method of metric learning with a self-supervised auxiliary loss. Furthermore, diverging from conventional self-supervised learning methodologies, we discovered the advantages of concurrently training the model with both self-supervision and supervision signals, without freezing pre-trained models. We also found that refraining from employing augmentation during the fine-tuning phase yields better results. Our experimental results confirm that the proposed methodology enhances retrieval and tagging performance metrics in two distinct scenarios: one where human-annotated tags are consistently available for all music tracks, and another where such tags are accessible only for a subset of tracks.</abstract><doi>10.48550/arxiv.2304.07449</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2304.07449
ispartof
issn
language eng
recordid cdi_arxiv_primary_2304_07449
source arXiv.org
subjects Computer Science - Information Retrieval
Computer Science - Learning
Computer Science - Sound
title Self-supervised Auxiliary Loss for Metric Learning in Music Similarity-based Retrieval and Auto-tagging
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T04%3A44%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Self-supervised%20Auxiliary%20Loss%20for%20Metric%20Learning%20in%20Music%20Similarity-based%20Retrieval%20and%20Auto-tagging&rft.au=Akama,%20Taketo&rft.date=2023-04-14&rft_id=info:doi/10.48550/arxiv.2304.07449&rft_dat=%3Carxiv_GOX%3E2304_07449%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true