Improved feature extraction for CRNN-based multiple sound source localization

In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propos...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Grumiaux, Pierre-Amaury, Kitic, Srdan, Girin, Laurent, Guérin, Alexandre
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Grumiaux, Pierre-Amaury Kitic, Srdan Girin, Laurent Guérin, Alexandre
description	In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propose several configurations with more convolutional layers and smaller pooling sizes in-between, so that less information is lost across the layers, leading to a better feature extraction. In parallel, we test the system's ability to localize up to 3 sources, in which case the improved feature extraction provides the most significant boost in accuracy. We evaluate and compare these improved configurations on synthetic and real-world data. The obtained results show a quite substantial improvement of the multiple sound source localization performance over the baseline network.
doi_str_mv	10.48550/arxiv.2105.01897
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2105_01897</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2105_01897</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-d832cd483a433beb7003ffca5bd0448f179522c178f532589fd4e3f37f399c3</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_QILta2N7iSIelUqRKvbRjR-SpaSOnKQqfD2ksJnZzBnpEHLHWS2NUuwByzmdasGZqhk3Vl-T9-0wlnwKnsaA81ICDee5oJtTPtKYC20O-33V4fS7GJZ-TmMf6JSXo1-zuED77LBP37gSN-QqYj-F2__ekMPL82fzVu0-XrfN067CR60rb0A4Lw2gBOhCpxmDGB2qzjMpTeTaKiEc1yYqEMrY6GWACDqCtQ425P7v9GLTjiUNWL7a1aq9WMEPF71IaA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Improved feature extraction for CRNN-based multiple sound source localization</title><source>arXiv.org</source><creator>Grumiaux, Pierre-Amaury ; Kitic, Srdan ; Girin, Laurent ; Guérin, Alexandre</creator><creatorcontrib>Grumiaux, Pierre-Amaury ; Kitic, Srdan ; Girin, Laurent ; Guérin, Alexandre</creatorcontrib><description>In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propose several configurations with more convolutional layers and smaller pooling sizes in-between, so that less information is lost across the layers, leading to a better feature extraction. In parallel, we test the system's ability to localize up to 3 sources, in which case the improved feature extraction provides the most significant boost in accuracy. We evaluate and compare these improved configurations on synthetic and real-world data. The obtained results show a quite substantial improvement of the multiple sound source localization performance over the baseline network.</description><identifier>DOI: 10.48550/arxiv.2105.01897</identifier><language>eng</language><subject>Computer Science - Sound</subject><creationdate>2021-05</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2105.01897$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2105.01897$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Grumiaux, Pierre-Amaury</creatorcontrib><creatorcontrib>Kitic, Srdan</creatorcontrib><creatorcontrib>Girin, Laurent</creatorcontrib><creatorcontrib>Guérin, Alexandre</creatorcontrib><title>Improved feature extraction for CRNN-based multiple sound source localization</title><description>In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propose several configurations with more convolutional layers and smaller pooling sizes in-between, so that less information is lost across the layers, leading to a better feature extraction. In parallel, we test the system's ability to localize up to 3 sources, in which case the improved feature extraction provides the most significant boost in accuracy. We evaluate and compare these improved configurations on synthetic and real-world data. The obtained results show a quite substantial improvement of the multiple sound source localization performance over the baseline network.</description><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_QILta2N7iSIelUqRKvbRjR-SpaSOnKQqfD2ksJnZzBnpEHLHWS2NUuwByzmdasGZqhk3Vl-T9-0wlnwKnsaA81ICDee5oJtTPtKYC20O-33V4fS7GJZ-TmMf6JSXo1-zuED77LBP37gSN-QqYj-F2__ekMPL82fzVu0-XrfN067CR60rb0A4Lw2gBOhCpxmDGB2qzjMpTeTaKiEc1yYqEMrY6GWACDqCtQ425P7v9GLTjiUNWL7a1aq9WMEPF71IaA</recordid><startdate>20210505</startdate><enddate>20210505</enddate><creator>Grumiaux, Pierre-Amaury</creator><creator>Kitic, Srdan</creator><creator>Girin, Laurent</creator><creator>Guérin, Alexandre</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210505</creationdate><title>Improved feature extraction for CRNN-based multiple sound source localization</title><author>Grumiaux, Pierre-Amaury ; Kitic, Srdan ; Girin, Laurent ; Guérin, Alexandre</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-d832cd483a433beb7003ffca5bd0448f179522c178f532589fd4e3f37f399c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Grumiaux, Pierre-Amaury</creatorcontrib><creatorcontrib>Kitic, Srdan</creatorcontrib><creatorcontrib>Girin, Laurent</creatorcontrib><creatorcontrib>Guérin, Alexandre</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Grumiaux, Pierre-Amaury</au><au>Kitic, Srdan</au><au>Girin, Laurent</au><au>Guérin, Alexandre</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improved feature extraction for CRNN-based multiple sound source localization</atitle><date>2021-05-05</date><risdate>2021</risdate><abstract>In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propose several configurations with more convolutional layers and smaller pooling sizes in-between, so that less information is lost across the layers, leading to a better feature extraction. In parallel, we test the system's ability to localize up to 3 sources, in which case the improved feature extraction provides the most significant boost in accuracy. We evaluate and compare these improved configurations on synthetic and real-world data. The obtained results show a quite substantial improvement of the multiple sound source localization performance over the baseline network.</abstract><doi>10.48550/arxiv.2105.01897</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2105.01897
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2105_01897
source	arXiv.org
subjects	Computer Science - Sound
title	Improved feature extraction for CRNN-based multiple sound source localization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T09%3A58%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improved%20feature%20extraction%20for%20CRNN-based%20multiple%20sound%20source%20localization&rft.au=Grumiaux,%20Pierre-Amaury&rft.date=2021-05-05&rft_id=info:doi/10.48550/arxiv.2105.01897&rft_dat=%3Carxiv_GOX%3E2105_01897%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true