Deep Complex-Valued Neural Network-Based Triple-Path Mask and Steering Vector Estimation for Multichannel Target Speech Separation
We propose a deep complex-valued neural network-based beamforming framework for multichannel target speech separation. The deep complex-valued neural network predicts steering vectors and complex ratio masks for speaker signals. The masked signals are then used to calculate the spatial covariance ma...
Gespeichert in:
Veröffentlicht in: | Journal of Signal Processing 2023/07/01, Vol.27(4), pp.87-91 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng ; jpn |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 91 |
---|---|
container_issue | 4 |
container_start_page | 87 |
container_title | Journal of Signal Processing |
container_volume | 27 |
creator | Qin, Mohan Li, Li Makino, Shoji |
description | We propose a deep complex-valued neural network-based beamforming framework for multichannel target speech separation. The deep complex-valued neural network predicts steering vectors and complex ratio masks for speaker signals. The masked signals are then used to calculate the spatial covariance matrices needed for minimum variance distortionless response (MVDR) beamforming. We propose triple-path modeling for mask estimation, which takes both intrachannel and interchannel features into consideration. Our experimental results revealed that the proposed framework achieves better target speech separation performance than do the baseline methods. |
doi_str_mv | 10.2299/jsp.27.87 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2864832726</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2864832726</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2737-e97d89af408328c91698ea9d31bbafb14494ffabcd6082e4aaae0ea266f971643</originalsourceid><addsrcrecordid>eNo90E1P20AQBmALFYk09MA_WIlTD073i_2QuNAApVJoKyXkuprY48SJsc3uWsC1v7xLAznN7OjRrObNsjNGJ5xb-20b-gnXE6OPshEzhuaMMvEp9ULyXHFBT7LPIWwpVUpfiFH29xqxJ9PusW_wJV9CM2BJfuHgoUklPnd-l3-HkIYLXyeT_4G4IfcQdgTakswjoq_bNVliETtPbkKsHyHWXUuq9LwfmlgXG2hbbMgC_BojmfeIxYbMsQf_X55mxxU0Ab-813H2cHuzmN7ls98_fk6vZnnBtdA5Wl0aC5WkRnBTWKasQbClYKsVVCsmpZVVBauiVNRwlACAFIErVVnNlBTj7Hy_t_fd04Ahum03-DZ96bhRMm3VXCX1da8K34XgsXK9Tyf5V8eoe4vYpYgd187oZC_3dhsirPEgwaerG_yQcs8P4xSId9iKf97GhuM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2864832726</pqid></control><display><type>article</type><title>Deep Complex-Valued Neural Network-Based Triple-Path Mask and Steering Vector Estimation for Multichannel Target Speech Separation</title><source>J-STAGE Free</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Qin, Mohan ; Li, Li ; Makino, Shoji</creator><creatorcontrib>Qin, Mohan ; Li, Li ; Makino, Shoji</creatorcontrib><description>We propose a deep complex-valued neural network-based beamforming framework for multichannel target speech separation. The deep complex-valued neural network predicts steering vectors and complex ratio masks for speaker signals. The masked signals are then used to calculate the spatial covariance matrices needed for minimum variance distortionless response (MVDR) beamforming. We propose triple-path modeling for mask estimation, which takes both intrachannel and interchannel features into consideration. Our experimental results revealed that the proposed framework achieves better target speech separation performance than do the baseline methods.</description><identifier>ISSN: 1342-6230</identifier><identifier>EISSN: 1880-1013</identifier><identifier>DOI: 10.2299/jsp.27.87</identifier><language>eng ; jpn</language><publisher>Tokyo: Research Institute of Signal Processing, Japan</publisher><subject>Beamforming ; Covariance matrix ; Neural networks ; Separation ; Speech ; Steering</subject><ispartof>Journal of Signal Processing, 2023/07/01, Vol.27(4), pp.87-91</ispartof><rights>2023 Research Institute of Signal Processing, Japan</rights><rights>Copyright Japan Science and Technology Agency 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2737-e97d89af408328c91698ea9d31bbafb14494ffabcd6082e4aaae0ea266f971643</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1881,27923,27924</link.rule.ids></links><search><creatorcontrib>Qin, Mohan</creatorcontrib><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Makino, Shoji</creatorcontrib><title>Deep Complex-Valued Neural Network-Based Triple-Path Mask and Steering Vector Estimation for Multichannel Target Speech Separation</title><title>Journal of Signal Processing</title><addtitle>Journal of Signal Processing</addtitle><description>We propose a deep complex-valued neural network-based beamforming framework for multichannel target speech separation. The deep complex-valued neural network predicts steering vectors and complex ratio masks for speaker signals. The masked signals are then used to calculate the spatial covariance matrices needed for minimum variance distortionless response (MVDR) beamforming. We propose triple-path modeling for mask estimation, which takes both intrachannel and interchannel features into consideration. Our experimental results revealed that the proposed framework achieves better target speech separation performance than do the baseline methods.</description><subject>Beamforming</subject><subject>Covariance matrix</subject><subject>Neural networks</subject><subject>Separation</subject><subject>Speech</subject><subject>Steering</subject><issn>1342-6230</issn><issn>1880-1013</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo90E1P20AQBmALFYk09MA_WIlTD073i_2QuNAApVJoKyXkuprY48SJsc3uWsC1v7xLAznN7OjRrObNsjNGJ5xb-20b-gnXE6OPshEzhuaMMvEp9ULyXHFBT7LPIWwpVUpfiFH29xqxJ9PusW_wJV9CM2BJfuHgoUklPnd-l3-HkIYLXyeT_4G4IfcQdgTakswjoq_bNVliETtPbkKsHyHWXUuq9LwfmlgXG2hbbMgC_BojmfeIxYbMsQf_X55mxxU0Ab-813H2cHuzmN7ls98_fk6vZnnBtdA5Wl0aC5WkRnBTWKasQbClYKsVVCsmpZVVBauiVNRwlACAFIErVVnNlBTj7Hy_t_fd04Ahum03-DZ96bhRMm3VXCX1da8K34XgsXK9Tyf5V8eoe4vYpYgd187oZC_3dhsirPEgwaerG_yQcs8P4xSId9iKf97GhuM</recordid><startdate>20230701</startdate><enddate>20230701</enddate><creator>Qin, Mohan</creator><creator>Li, Li</creator><creator>Makino, Shoji</creator><general>Research Institute of Signal Processing, Japan</general><general>Japan Science and Technology Agency</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20230701</creationdate><title>Deep Complex-Valued Neural Network-Based Triple-Path Mask and Steering Vector Estimation for Multichannel Target Speech Separation</title><author>Qin, Mohan ; Li, Li ; Makino, Shoji</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2737-e97d89af408328c91698ea9d31bbafb14494ffabcd6082e4aaae0ea266f971643</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng ; jpn</language><creationdate>2023</creationdate><topic>Beamforming</topic><topic>Covariance matrix</topic><topic>Neural networks</topic><topic>Separation</topic><topic>Speech</topic><topic>Steering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Qin, Mohan</creatorcontrib><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Makino, Shoji</creatorcontrib><collection>CrossRef</collection><jtitle>Journal of Signal Processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Qin, Mohan</au><au>Li, Li</au><au>Makino, Shoji</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Complex-Valued Neural Network-Based Triple-Path Mask and Steering Vector Estimation for Multichannel Target Speech Separation</atitle><jtitle>Journal of Signal Processing</jtitle><addtitle>Journal of Signal Processing</addtitle><date>2023-07-01</date><risdate>2023</risdate><volume>27</volume><issue>4</issue><spage>87</spage><epage>91</epage><pages>87-91</pages><issn>1342-6230</issn><eissn>1880-1013</eissn><abstract>We propose a deep complex-valued neural network-based beamforming framework for multichannel target speech separation. The deep complex-valued neural network predicts steering vectors and complex ratio masks for speaker signals. The masked signals are then used to calculate the spatial covariance matrices needed for minimum variance distortionless response (MVDR) beamforming. We propose triple-path modeling for mask estimation, which takes both intrachannel and interchannel features into consideration. Our experimental results revealed that the proposed framework achieves better target speech separation performance than do the baseline methods.</abstract><cop>Tokyo</cop><pub>Research Institute of Signal Processing, Japan</pub><doi>10.2299/jsp.27.87</doi><tpages>5</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1342-6230 |
ispartof | Journal of Signal Processing, 2023/07/01, Vol.27(4), pp.87-91 |
issn | 1342-6230 1880-1013 |
language | eng ; jpn |
recordid | cdi_proquest_journals_2864832726 |
source | J-STAGE Free; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Beamforming Covariance matrix Neural networks Separation Speech Steering |
title | Deep Complex-Valued Neural Network-Based Triple-Path Mask and Steering Vector Estimation for Multichannel Target Speech Separation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T12%3A23%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Complex-Valued%20Neural%20Network-Based%20Triple-Path%20Mask%20and%20Steering%20Vector%20Estimation%20for%20Multichannel%20Target%20Speech%20Separation&rft.jtitle=Journal%20of%20Signal%20Processing&rft.au=Qin,%20Mohan&rft.date=2023-07-01&rft.volume=27&rft.issue=4&rft.spage=87&rft.epage=91&rft.pages=87-91&rft.issn=1342-6230&rft.eissn=1880-1013&rft_id=info:doi/10.2299/jsp.27.87&rft_dat=%3Cproquest_cross%3E2864832726%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2864832726&rft_id=info:pmid/&rfr_iscdi=true |