Multi-level Attention Model for Weakly Supervised Audio Classification

In this paper, we propose a multi-level attention model to solve the weakly labelled audio classification problem. The objective of audio classification is to predict the presence or absence of audio events in an audio clip. Recently, Google published a large scale weakly labelled dataset called Aud...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yu, Changsong, Barsim, Karim Said, Kong, Qiuqiang, Yang, Bin
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Yu, Changsong Barsim, Karim Said Kong, Qiuqiang Yang, Bin
description	In this paper, we propose a multi-level attention model to solve the weakly labelled audio classification problem. The objective of audio classification is to predict the presence or absence of audio events in an audio clip. Recently, Google published a large scale weakly labelled dataset called Audio Set, where each audio clip contains only the presence or absence of the audio events, without the onset and offset time of the audio events. Our multi-level attention model is an extension to the previously proposed single-level attention model. It consists of several attention modules applied on intermediate neural network layers. The output of these attention modules are concatenated to a vector followed by a multi-label classifier to make the final prediction of each class. Experiments shown that our model achieves a mean average precision (mAP) of 0.360, outperforms the state-of-the-art single-level attention model of 0.327 and Google baseline of 0.314.
doi_str_mv	10.48550/arxiv.1803.02353
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1803_02353</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1803_02353</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-9e32cb42cee374c06bdd11ca0d6831ad056b52b4976d94503db13d23dfa829bb3</originalsourceid><addsrcrecordid>eNotz71OwzAUBWAvDKjlAZjwCyTYvraTjFFEAalVByoxRte-jmThNlX-RN8eWpiOznCO9DH2KEWuS2PEMw7fccllKSAXCgzcs81uTlPMUlhC4vU0hdMU-xPf9fTbu37gnwG_0oV_zOcwLHEMxOuZYs-bhOMYu-jxOlizuw7TGB7-c8UOm5dD85Zt96_vTb3N0BaQVQGUd1r5EKDQXlhHJKVHQbYEiSSMdUY5XRWWKm0EkJNACqjDUlXOwYo9_d3eIO15iEccLu0V1N5A8AMn-EZa</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Multi-level Attention Model for Weakly Supervised Audio Classification</title><source>arXiv.org</source><creator>Yu, Changsong ; Barsim, Karim Said ; Kong, Qiuqiang ; Yang, Bin</creator><creatorcontrib>Yu, Changsong ; Barsim, Karim Said ; Kong, Qiuqiang ; Yang, Bin</creatorcontrib><description>In this paper, we propose a multi-level attention model to solve the weakly labelled audio classification problem. The objective of audio classification is to predict the presence or absence of audio events in an audio clip. Recently, Google published a large scale weakly labelled dataset called Audio Set, where each audio clip contains only the presence or absence of the audio events, without the onset and offset time of the audio events. Our multi-level attention model is an extension to the previously proposed single-level attention model. It consists of several attention modules applied on intermediate neural network layers. The output of these attention modules are concatenated to a vector followed by a multi-label classifier to make the final prediction of each class. Experiments shown that our model achieves a mean average precision (mAP) of 0.360, outperforms the state-of-the-art single-level attention model of 0.327 and Google baseline of 0.314.</description><identifier>DOI: 10.48550/arxiv.1803.02353</identifier><language>eng</language><subject>Computer Science - Sound</subject><creationdate>2018-03</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1803.02353$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1803.02353$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Yu, Changsong</creatorcontrib><creatorcontrib>Barsim, Karim Said</creatorcontrib><creatorcontrib>Kong, Qiuqiang</creatorcontrib><creatorcontrib>Yang, Bin</creatorcontrib><title>Multi-level Attention Model for Weakly Supervised Audio Classification</title><description>In this paper, we propose a multi-level attention model to solve the weakly labelled audio classification problem. The objective of audio classification is to predict the presence or absence of audio events in an audio clip. Recently, Google published a large scale weakly labelled dataset called Audio Set, where each audio clip contains only the presence or absence of the audio events, without the onset and offset time of the audio events. Our multi-level attention model is an extension to the previously proposed single-level attention model. It consists of several attention modules applied on intermediate neural network layers. The output of these attention modules are concatenated to a vector followed by a multi-label classifier to make the final prediction of each class. Experiments shown that our model achieves a mean average precision (mAP) of 0.360, outperforms the state-of-the-art single-level attention model of 0.327 and Google baseline of 0.314.</description><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUBWAvDKjlAZjwCyTYvraTjFFEAalVByoxRte-jmThNlX-RN8eWpiOznCO9DH2KEWuS2PEMw7fccllKSAXCgzcs81uTlPMUlhC4vU0hdMU-xPf9fTbu37gnwG_0oV_zOcwLHEMxOuZYs-bhOMYu-jxOlizuw7TGB7-c8UOm5dD85Zt96_vTb3N0BaQVQGUd1r5EKDQXlhHJKVHQbYEiSSMdUY5XRWWKm0EkJNACqjDUlXOwYo9_d3eIO15iEccLu0V1N5A8AMn-EZa</recordid><startdate>20180306</startdate><enddate>20180306</enddate><creator>Yu, Changsong</creator><creator>Barsim, Karim Said</creator><creator>Kong, Qiuqiang</creator><creator>Yang, Bin</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20180306</creationdate><title>Multi-level Attention Model for Weakly Supervised Audio Classification</title><author>Yu, Changsong ; Barsim, Karim Said ; Kong, Qiuqiang ; Yang, Bin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-9e32cb42cee374c06bdd11ca0d6831ad056b52b4976d94503db13d23dfa829bb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Yu, Changsong</creatorcontrib><creatorcontrib>Barsim, Karim Said</creatorcontrib><creatorcontrib>Kong, Qiuqiang</creatorcontrib><creatorcontrib>Yang, Bin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yu, Changsong</au><au>Barsim, Karim Said</au><au>Kong, Qiuqiang</au><au>Yang, Bin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-level Attention Model for Weakly Supervised Audio Classification</atitle><date>2018-03-06</date><risdate>2018</risdate><abstract>In this paper, we propose a multi-level attention model to solve the weakly labelled audio classification problem. The objective of audio classification is to predict the presence or absence of audio events in an audio clip. Recently, Google published a large scale weakly labelled dataset called Audio Set, where each audio clip contains only the presence or absence of the audio events, without the onset and offset time of the audio events. Our multi-level attention model is an extension to the previously proposed single-level attention model. It consists of several attention modules applied on intermediate neural network layers. The output of these attention modules are concatenated to a vector followed by a multi-label classifier to make the final prediction of each class. Experiments shown that our model achieves a mean average precision (mAP) of 0.360, outperforms the state-of-the-art single-level attention model of 0.327 and Google baseline of 0.314.</abstract><doi>10.48550/arxiv.1803.02353</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1803.02353
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1803_02353
source	arXiv.org
subjects	Computer Science - Sound
title	Multi-level Attention Model for Weakly Supervised Audio Classification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T01%3A00%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-level%20Attention%20Model%20for%20Weakly%20Supervised%20Audio%20Classification&rft.au=Yu,%20Changsong&rft.date=2018-03-06&rft_id=info:doi/10.48550/arxiv.1803.02353&rft_dat=%3Carxiv_GOX%3E1803_02353%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true