DOA Estimation for Multiple Speech Sources Based on Flexible Single-Source Zones and Concentration Weighting

Direction of arrival (DOA) estimation is the key to many audio applications. Recently, sparse component analysis (SCA)-based methods have attracted much attention, in which single-source points (SSPs) and single-source zones (SSZs) where one source is dominant over the others in time-frequency domai...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE sensors journal 2023-05, Vol.23 (10), p.1-1
Hauptverfasser: Zhao, Zhao, Kan, Hongrui, Lin, Jiale, Xu, Zhiyong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue 10
container_start_page 1
container_title IEEE sensors journal
container_volume 23
creator Zhao, Zhao
Kan, Hongrui
Lin, Jiale
Xu, Zhiyong
description Direction of arrival (DOA) estimation is the key to many audio applications. Recently, sparse component analysis (SCA)-based methods have attracted much attention, in which single-source points (SSPs) and single-source zones (SSZs) where one source is dominant over the others in time-frequency domain are usually detected to construct the pooled histogram containing multi-source DOA information. Nonetheless, the SSZ size in existing methods is fixed and empirically predetermined, which cannot accommodate to the varying spectro-temporal property of speech sources. Furthermore, higher SSP concentration in a SSZ implies a locally stronger dominant source as well as more reliable DOA information extracted therein, which however is also not taken into account yet. To address these problems, a DOA estimation algorithm for multiple speech sources based on flexible SSZs and concentration weighting is presented in this paper. First, in each frame, correlation coefficients of time delay vectors across adjacent frequency bins are calculated to identify SSPs, followed by flexible SSZs construction using varying number of SSPs located at consecutive frequency bins. Next, the number of SSPs in each flexible SSZ is considered as a proxy of corresponding concentration degree, and employed as weighting factor to form the pooled histogram. Finally, a matching pursuit (MP)-based approach is utilized to obtain multi-source DOA estimates. Simulation results reveal that the proposed method significantly outperforms existing approaches in terms of noise floor in pooled histogram, angular resolution, and performance under various signal-to-noise ratio and reverberant conditions. Real-world experiments also verify its effectiveness, and meanwhile demonstrate considerably reduced computational complexity.
doi_str_mv 10.1109/JSEN.2023.3263861
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_JSEN_2023_3263861</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10097554</ieee_id><sourcerecordid>2814572192</sourcerecordid><originalsourceid>FETCH-LOGICAL-c294t-524c6b41d06deda470eabe1bc798fec82c025ae1d944115aa1b791f0583d6ebe3</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRS0EEqXwAUgsLLFO8fgRJ8tSWh4qdFEQiE3kOJM2VUhCnErw9zhKF6xmFufeGR1CLoFNAFh887Sev0w442IieCiiEI7ICJSKAtAyOu53wQIp9McpOXNuxxjEWukRKe9WUzp3XfFluqKuaF639HlfdkVTIl03iHZL1_W-tejorXGYUQ8tSvwp0h4oqk2JwQDQz7rylKkyOqsri1XXDp3vWGy2nUfPyUluSocXhzkmb4v56-whWK7uH2fTZWB5LLtAcWnDVELGwgwzIzVDkyKkVsdRjjbilnFlELJYSgBlDKQ6hpypSGQhpijG5Hrobdr6e4-uS3b-w8qfTHgEUmkOMfcUDJRta-dazJOm9Rra3wRY0ktNeqlJLzU5SPWZqyFTIOI_nnmbSoo_uet0OA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2814572192</pqid></control><display><type>article</type><title>DOA Estimation for Multiple Speech Sources Based on Flexible Single-Source Zones and Concentration Weighting</title><source>IEEE Xplore</source><creator>Zhao, Zhao ; Kan, Hongrui ; Lin, Jiale ; Xu, Zhiyong</creator><creatorcontrib>Zhao, Zhao ; Kan, Hongrui ; Lin, Jiale ; Xu, Zhiyong</creatorcontrib><description>Direction of arrival (DOA) estimation is the key to many audio applications. Recently, sparse component analysis (SCA)-based methods have attracted much attention, in which single-source points (SSPs) and single-source zones (SSZs) where one source is dominant over the others in time-frequency domain are usually detected to construct the pooled histogram containing multi-source DOA information. Nonetheless, the SSZ size in existing methods is fixed and empirically predetermined, which cannot accommodate to the varying spectro-temporal property of speech sources. Furthermore, higher SSP concentration in a SSZ implies a locally stronger dominant source as well as more reliable DOA information extracted therein, which however is also not taken into account yet. To address these problems, a DOA estimation algorithm for multiple speech sources based on flexible SSZs and concentration weighting is presented in this paper. First, in each frame, correlation coefficients of time delay vectors across adjacent frequency bins are calculated to identify SSPs, followed by flexible SSZs construction using varying number of SSPs located at consecutive frequency bins. Next, the number of SSPs in each flexible SSZ is considered as a proxy of corresponding concentration degree, and employed as weighting factor to form the pooled histogram. Finally, a matching pursuit (MP)-based approach is utilized to obtain multi-source DOA estimates. Simulation results reveal that the proposed method significantly outperforms existing approaches in terms of noise floor in pooled histogram, angular resolution, and performance under various signal-to-noise ratio and reverberant conditions. Real-world experiments also verify its effectiveness, and meanwhile demonstrate considerably reduced computational complexity.</description><identifier>ISSN: 1530-437X</identifier><identifier>EISSN: 1558-1748</identifier><identifier>DOI: 10.1109/JSEN.2023.3263861</identifier><identifier>CODEN: ISJEAZ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Angular resolution ; Bins ; concentration weighting ; Correlation coefficients ; Direction of arrival ; Direction-of-arrival estimation ; DOA estimation ; Estimation ; flexible single-source zone ; Histograms ; Matched pursuit ; microphone array ; Microphone arrays ; Reverberation ; Sensors ; Signal to noise ratio ; Speech ; Time lag ; Time-frequency analysis ; Weighting</subject><ispartof>IEEE sensors journal, 2023-05, Vol.23 (10), p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c294t-524c6b41d06deda470eabe1bc798fec82c025ae1d944115aa1b791f0583d6ebe3</citedby><cites>FETCH-LOGICAL-c294t-524c6b41d06deda470eabe1bc798fec82c025ae1d944115aa1b791f0583d6ebe3</cites><orcidid>0000-0001-7901-500X ; 0000-0003-4470-0888</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10097554$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10097554$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhao, Zhao</creatorcontrib><creatorcontrib>Kan, Hongrui</creatorcontrib><creatorcontrib>Lin, Jiale</creatorcontrib><creatorcontrib>Xu, Zhiyong</creatorcontrib><title>DOA Estimation for Multiple Speech Sources Based on Flexible Single-Source Zones and Concentration Weighting</title><title>IEEE sensors journal</title><addtitle>JSEN</addtitle><description>Direction of arrival (DOA) estimation is the key to many audio applications. Recently, sparse component analysis (SCA)-based methods have attracted much attention, in which single-source points (SSPs) and single-source zones (SSZs) where one source is dominant over the others in time-frequency domain are usually detected to construct the pooled histogram containing multi-source DOA information. Nonetheless, the SSZ size in existing methods is fixed and empirically predetermined, which cannot accommodate to the varying spectro-temporal property of speech sources. Furthermore, higher SSP concentration in a SSZ implies a locally stronger dominant source as well as more reliable DOA information extracted therein, which however is also not taken into account yet. To address these problems, a DOA estimation algorithm for multiple speech sources based on flexible SSZs and concentration weighting is presented in this paper. First, in each frame, correlation coefficients of time delay vectors across adjacent frequency bins are calculated to identify SSPs, followed by flexible SSZs construction using varying number of SSPs located at consecutive frequency bins. Next, the number of SSPs in each flexible SSZ is considered as a proxy of corresponding concentration degree, and employed as weighting factor to form the pooled histogram. Finally, a matching pursuit (MP)-based approach is utilized to obtain multi-source DOA estimates. Simulation results reveal that the proposed method significantly outperforms existing approaches in terms of noise floor in pooled histogram, angular resolution, and performance under various signal-to-noise ratio and reverberant conditions. Real-world experiments also verify its effectiveness, and meanwhile demonstrate considerably reduced computational complexity.</description><subject>Algorithms</subject><subject>Angular resolution</subject><subject>Bins</subject><subject>concentration weighting</subject><subject>Correlation coefficients</subject><subject>Direction of arrival</subject><subject>Direction-of-arrival estimation</subject><subject>DOA estimation</subject><subject>Estimation</subject><subject>flexible single-source zone</subject><subject>Histograms</subject><subject>Matched pursuit</subject><subject>microphone array</subject><subject>Microphone arrays</subject><subject>Reverberation</subject><subject>Sensors</subject><subject>Signal to noise ratio</subject><subject>Speech</subject><subject>Time lag</subject><subject>Time-frequency analysis</subject><subject>Weighting</subject><issn>1530-437X</issn><issn>1558-1748</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtOwzAQRS0EEqXwAUgsLLFO8fgRJ8tSWh4qdFEQiE3kOJM2VUhCnErw9zhKF6xmFufeGR1CLoFNAFh887Sev0w442IieCiiEI7ICJSKAtAyOu53wQIp9McpOXNuxxjEWukRKe9WUzp3XfFluqKuaF639HlfdkVTIl03iHZL1_W-tejorXGYUQ8tSvwp0h4oqk2JwQDQz7rylKkyOqsri1XXDp3vWGy2nUfPyUluSocXhzkmb4v56-whWK7uH2fTZWB5LLtAcWnDVELGwgwzIzVDkyKkVsdRjjbilnFlELJYSgBlDKQ6hpypSGQhpijG5Hrobdr6e4-uS3b-w8qfTHgEUmkOMfcUDJRta-dazJOm9Rra3wRY0ktNeqlJLzU5SPWZqyFTIOI_nnmbSoo_uet0OA</recordid><startdate>20230515</startdate><enddate>20230515</enddate><creator>Zhao, Zhao</creator><creator>Kan, Hongrui</creator><creator>Lin, Jiale</creator><creator>Xu, Zhiyong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>7U5</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0001-7901-500X</orcidid><orcidid>https://orcid.org/0000-0003-4470-0888</orcidid></search><sort><creationdate>20230515</creationdate><title>DOA Estimation for Multiple Speech Sources Based on Flexible Single-Source Zones and Concentration Weighting</title><author>Zhao, Zhao ; Kan, Hongrui ; Lin, Jiale ; Xu, Zhiyong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c294t-524c6b41d06deda470eabe1bc798fec82c025ae1d944115aa1b791f0583d6ebe3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Angular resolution</topic><topic>Bins</topic><topic>concentration weighting</topic><topic>Correlation coefficients</topic><topic>Direction of arrival</topic><topic>Direction-of-arrival estimation</topic><topic>DOA estimation</topic><topic>Estimation</topic><topic>flexible single-source zone</topic><topic>Histograms</topic><topic>Matched pursuit</topic><topic>microphone array</topic><topic>Microphone arrays</topic><topic>Reverberation</topic><topic>Sensors</topic><topic>Signal to noise ratio</topic><topic>Speech</topic><topic>Time lag</topic><topic>Time-frequency analysis</topic><topic>Weighting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Zhao</creatorcontrib><creatorcontrib>Kan, Hongrui</creatorcontrib><creatorcontrib>Lin, Jiale</creatorcontrib><creatorcontrib>Xu, Zhiyong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE sensors journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhao, Zhao</au><au>Kan, Hongrui</au><au>Lin, Jiale</au><au>Xu, Zhiyong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DOA Estimation for Multiple Speech Sources Based on Flexible Single-Source Zones and Concentration Weighting</atitle><jtitle>IEEE sensors journal</jtitle><stitle>JSEN</stitle><date>2023-05-15</date><risdate>2023</risdate><volume>23</volume><issue>10</issue><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1530-437X</issn><eissn>1558-1748</eissn><coden>ISJEAZ</coden><abstract>Direction of arrival (DOA) estimation is the key to many audio applications. Recently, sparse component analysis (SCA)-based methods have attracted much attention, in which single-source points (SSPs) and single-source zones (SSZs) where one source is dominant over the others in time-frequency domain are usually detected to construct the pooled histogram containing multi-source DOA information. Nonetheless, the SSZ size in existing methods is fixed and empirically predetermined, which cannot accommodate to the varying spectro-temporal property of speech sources. Furthermore, higher SSP concentration in a SSZ implies a locally stronger dominant source as well as more reliable DOA information extracted therein, which however is also not taken into account yet. To address these problems, a DOA estimation algorithm for multiple speech sources based on flexible SSZs and concentration weighting is presented in this paper. First, in each frame, correlation coefficients of time delay vectors across adjacent frequency bins are calculated to identify SSPs, followed by flexible SSZs construction using varying number of SSPs located at consecutive frequency bins. Next, the number of SSPs in each flexible SSZ is considered as a proxy of corresponding concentration degree, and employed as weighting factor to form the pooled histogram. Finally, a matching pursuit (MP)-based approach is utilized to obtain multi-source DOA estimates. Simulation results reveal that the proposed method significantly outperforms existing approaches in terms of noise floor in pooled histogram, angular resolution, and performance under various signal-to-noise ratio and reverberant conditions. Real-world experiments also verify its effectiveness, and meanwhile demonstrate considerably reduced computational complexity.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/JSEN.2023.3263861</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-7901-500X</orcidid><orcidid>https://orcid.org/0000-0003-4470-0888</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1530-437X
ispartof IEEE sensors journal, 2023-05, Vol.23 (10), p.1-1
issn 1530-437X
1558-1748
language eng
recordid cdi_crossref_primary_10_1109_JSEN_2023_3263861
source IEEE Xplore
subjects Algorithms
Angular resolution
Bins
concentration weighting
Correlation coefficients
Direction of arrival
Direction-of-arrival estimation
DOA estimation
Estimation
flexible single-source zone
Histograms
Matched pursuit
microphone array
Microphone arrays
Reverberation
Sensors
Signal to noise ratio
Speech
Time lag
Time-frequency analysis
Weighting
title DOA Estimation for Multiple Speech Sources Based on Flexible Single-Source Zones and Concentration Weighting
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T17%3A37%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DOA%20Estimation%20for%20Multiple%20Speech%20Sources%20Based%20on%20Flexible%20Single-Source%20Zones%20and%20Concentration%20Weighting&rft.jtitle=IEEE%20sensors%20journal&rft.au=Zhao,%20Zhao&rft.date=2023-05-15&rft.volume=23&rft.issue=10&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1530-437X&rft.eissn=1558-1748&rft.coden=ISJEAZ&rft_id=info:doi/10.1109/JSEN.2023.3263861&rft_dat=%3Cproquest_RIE%3E2814572192%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2814572192&rft_id=info:pmid/&rft_ieee_id=10097554&rfr_iscdi=true