Improved feature extraction for CRNN-based multiple sound source localization

In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propos...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Grumiaux, Pierre-Amaury, Kitic, Srdan, Girin, Laurent, Guérin, Alexandre
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Grumiaux, Pierre-Amaury
Kitic, Srdan
Girin, Laurent
Guérin, Alexandre
description In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propose several configurations with more convolutional layers and smaller pooling sizes in-between, so that less information is lost across the layers, leading to a better feature extraction. In parallel, we test the system's ability to localize up to 3 sources, in which case the improved feature extraction provides the most significant boost in accuracy. We evaluate and compare these improved configurations on synthetic and real-world data. The obtained results show a quite substantial improvement of the multiple sound source localization performance over the baseline network.
doi_str_mv 10.48550/arxiv.2105.01897
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2105_01897</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2105_01897</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-d832cd483a433beb7003ffca5bd0448f179522c178f532589fd4e3f37f399c3</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_QILta2N7iSIelUqRKvbRjR-SpaSOnKQqfD2ksJnZzBnpEHLHWS2NUuwByzmdasGZqhk3Vl-T9-0wlnwKnsaA81ICDee5oJtTPtKYC20O-33V4fS7GJZ-TmMf6JSXo1-zuED77LBP37gSN-QqYj-F2__ekMPL82fzVu0-XrfN067CR60rb0A4Lw2gBOhCpxmDGB2qzjMpTeTaKiEc1yYqEMrY6GWACDqCtQ425P7v9GLTjiUNWL7a1aq9WMEPF71IaA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Improved feature extraction for CRNN-based multiple sound source localization</title><source>arXiv.org</source><creator>Grumiaux, Pierre-Amaury ; Kitic, Srdan ; Girin, Laurent ; Guérin, Alexandre</creator><creatorcontrib>Grumiaux, Pierre-Amaury ; Kitic, Srdan ; Girin, Laurent ; Guérin, Alexandre</creatorcontrib><description>In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propose several configurations with more convolutional layers and smaller pooling sizes in-between, so that less information is lost across the layers, leading to a better feature extraction. In parallel, we test the system's ability to localize up to 3 sources, in which case the improved feature extraction provides the most significant boost in accuracy. We evaluate and compare these improved configurations on synthetic and real-world data. The obtained results show a quite substantial improvement of the multiple sound source localization performance over the baseline network.</description><identifier>DOI: 10.48550/arxiv.2105.01897</identifier><language>eng</language><subject>Computer Science - Sound</subject><creationdate>2021-05</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2105.01897$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2105.01897$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Grumiaux, Pierre-Amaury</creatorcontrib><creatorcontrib>Kitic, Srdan</creatorcontrib><creatorcontrib>Girin, Laurent</creatorcontrib><creatorcontrib>Guérin, Alexandre</creatorcontrib><title>Improved feature extraction for CRNN-based multiple sound source localization</title><description>In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propose several configurations with more convolutional layers and smaller pooling sizes in-between, so that less information is lost across the layers, leading to a better feature extraction. In parallel, we test the system's ability to localize up to 3 sources, in which case the improved feature extraction provides the most significant boost in accuracy. We evaluate and compare these improved configurations on synthetic and real-world data. The obtained results show a quite substantial improvement of the multiple sound source localization performance over the baseline network.</description><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_QILta2N7iSIelUqRKvbRjR-SpaSOnKQqfD2ksJnZzBnpEHLHWS2NUuwByzmdasGZqhk3Vl-T9-0wlnwKnsaA81ICDee5oJtTPtKYC20O-33V4fS7GJZ-TmMf6JSXo1-zuED77LBP37gSN-QqYj-F2__ekMPL82fzVu0-XrfN067CR60rb0A4Lw2gBOhCpxmDGB2qzjMpTeTaKiEc1yYqEMrY6GWACDqCtQ425P7v9GLTjiUNWL7a1aq9WMEPF71IaA</recordid><startdate>20210505</startdate><enddate>20210505</enddate><creator>Grumiaux, Pierre-Amaury</creator><creator>Kitic, Srdan</creator><creator>Girin, Laurent</creator><creator>Guérin, Alexandre</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210505</creationdate><title>Improved feature extraction for CRNN-based multiple sound source localization</title><author>Grumiaux, Pierre-Amaury ; Kitic, Srdan ; Girin, Laurent ; Guérin, Alexandre</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-d832cd483a433beb7003ffca5bd0448f179522c178f532589fd4e3f37f399c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Grumiaux, Pierre-Amaury</creatorcontrib><creatorcontrib>Kitic, Srdan</creatorcontrib><creatorcontrib>Girin, Laurent</creatorcontrib><creatorcontrib>Guérin, Alexandre</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Grumiaux, Pierre-Amaury</au><au>Kitic, Srdan</au><au>Girin, Laurent</au><au>Guérin, Alexandre</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improved feature extraction for CRNN-based multiple sound source localization</atitle><date>2021-05-05</date><risdate>2021</risdate><abstract>In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propose several configurations with more convolutional layers and smaller pooling sizes in-between, so that less information is lost across the layers, leading to a better feature extraction. In parallel, we test the system's ability to localize up to 3 sources, in which case the improved feature extraction provides the most significant boost in accuracy. We evaluate and compare these improved configurations on synthetic and real-world data. The obtained results show a quite substantial improvement of the multiple sound source localization performance over the baseline network.</abstract><doi>10.48550/arxiv.2105.01897</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2105.01897
ispartof
issn
language eng
recordid cdi_arxiv_primary_2105_01897
source arXiv.org
subjects Computer Science - Sound
title Improved feature extraction for CRNN-based multiple sound source localization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T09%3A58%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improved%20feature%20extraction%20for%20CRNN-based%20multiple%20sound%20source%20localization&rft.au=Grumiaux,%20Pierre-Amaury&rft.date=2021-05-05&rft_id=info:doi/10.48550/arxiv.2105.01897&rft_dat=%3Carxiv_GOX%3E2105_01897%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true