Dual input neural networks for positional sound source localization

In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-08
Hauptverfasser:	Grinstein, Eric, Neo, Vincent W, Naylor, Patrick A
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustic properties Algorithms Audio data Audio signals Localization Microphones Neural networks Recurrent neural networks Signal processing Sound localization Sound sources
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Grinstein, Eric Neo, Vincent W Naylor, Patrick A
description	In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, such as the microphones' coordinates in space, to estimate the position of a sound source. We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network. We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture, a classical Least-Squares (LS) method as well as a classical Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2847997651</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2847997651</sourcerecordid><originalsourceid>FETCH-proquest_journals_28479976513</originalsourceid><addsrcrecordid>eNqNTEsKwjAUDIJg0d4h4LqQJv2uq-IB3JdQU0gb8mJeguDpTcEDuJlhvjuScSHKoqs4P5AccWGM8abldS0yMlyiNFRbFwO1KvokrApv8CvSGTx1gDposMlHiPa5oZ8UNTBJoz9yy05kP0uDKv_xkZxv18dwL5yHV1QYxiWN0gWOvKvavm-buhT_tb5r4jtC</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2847997651</pqid></control><display><type>article</type><title>Dual input neural networks for positional sound source localization</title><source>Freely Accessible Journals</source><creator>Grinstein, Eric ; Neo, Vincent W ; Naylor, Patrick A</creator><creatorcontrib>Grinstein, Eric ; Neo, Vincent W ; Naylor, Patrick A</creatorcontrib><description>In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, such as the microphones' coordinates in space, to estimate the position of a sound source. We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network. We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture, a classical Least-Squares (LS) method as well as a classical Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Acoustic properties ; Algorithms ; Audio data ; Audio signals ; Localization ; Microphones ; Neural networks ; Recurrent neural networks ; Signal processing ; Sound localization ; Sound sources</subject><ispartof>arXiv.org, 2023-08</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Grinstein, Eric</creatorcontrib><creatorcontrib>Neo, Vincent W</creatorcontrib><creatorcontrib>Naylor, Patrick A</creatorcontrib><title>Dual input neural networks for positional sound source localization</title><title>arXiv.org</title><description>In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, such as the microphones' coordinates in space, to estimate the position of a sound source. We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network. We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture, a classical Least-Squares (LS) method as well as a classical Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings.</description><subject>Acoustic properties</subject><subject>Algorithms</subject><subject>Audio data</subject><subject>Audio signals</subject><subject>Localization</subject><subject>Microphones</subject><subject>Neural networks</subject><subject>Recurrent neural networks</subject><subject>Signal processing</subject><subject>Sound localization</subject><subject>Sound sources</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNTEsKwjAUDIJg0d4h4LqQJv2uq-IB3JdQU0gb8mJeguDpTcEDuJlhvjuScSHKoqs4P5AccWGM8abldS0yMlyiNFRbFwO1KvokrApv8CvSGTx1gDposMlHiPa5oZ8UNTBJoz9yy05kP0uDKv_xkZxv18dwL5yHV1QYxiWN0gWOvKvavm-buhT_tb5r4jtC</recordid><startdate>20230808</startdate><enddate>20230808</enddate><creator>Grinstein, Eric</creator><creator>Neo, Vincent W</creator><creator>Naylor, Patrick A</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PIMPY</scope><scope>PKEHL</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope></search><sort><creationdate>20230808</creationdate><title>Dual input neural networks for positional sound source localization</title><author>Grinstein, Eric ; Neo, Vincent W ; Naylor, Patrick A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28479976513</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Acoustic properties</topic><topic>Algorithms</topic><topic>Audio data</topic><topic>Audio signals</topic><topic>Localization</topic><topic>Microphones</topic><topic>Neural networks</topic><topic>Recurrent neural networks</topic><topic>Signal processing</topic><topic>Sound localization</topic><topic>Sound sources</topic><toplevel>online_resources</toplevel><creatorcontrib>Grinstein, Eric</creatorcontrib><creatorcontrib>Neo, Vincent W</creatorcontrib><creatorcontrib>Naylor, Patrick A</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied & Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Grinstein, Eric</au><au>Neo, Vincent W</au><au>Naylor, Patrick A</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Dual input neural networks for positional sound source localization</atitle><jtitle>arXiv.org</jtitle><date>2023-08-08</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, such as the microphones' coordinates in space, to estimate the position of a sound source. We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network. We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture, a classical Least-Squares (LS) method as well as a classical Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-08
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2847997651
source	Freely Accessible Journals
subjects	Acoustic properties Algorithms Audio data Audio signals Localization Microphones Neural networks Recurrent neural networks Signal processing Sound localization Sound sources
title	Dual input neural networks for positional sound source localization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T02%3A27%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Dual%20input%20neural%20networks%20for%20positional%20sound%20source%20localization&rft.jtitle=arXiv.org&rft.au=Grinstein,%20Eric&rft.date=2023-08-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2847997651%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2847997651&rft_id=info:pmid/&rfr_iscdi=true