On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks

Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally us...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2024, Vol.32, p.215-226
Hauptverfasser: Gelderblom, Femke B., Tronstad, Tron Vedul, Svendsen, Torbjorn, Myrvoll, Tor Andre
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 226
container_issue
container_start_page 215
container_title IEEE/ACM transactions on audio, speech, and language processing
container_volume 32
creator Gelderblom, Femke B.
Tronstad, Tron Vedul
Svendsen, Torbjorn
Myrvoll, Tor Andre
description Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning (DL) based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs. However, validation of the ability of these OIMs to predict subjective intelligibility when enhancing a speech signal using DL-based systems is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech enhancement systems. We found that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of 'enhanced' speech signals.
doi_str_mv 10.1109/TASLP.2023.3329378
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TASLP_2023_3329378</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10304336</ieee_id><sourcerecordid>2890103598</sourcerecordid><originalsourceid>FETCH-LOGICAL-c340t-75a94147e1b868fd36b051ae8d3aa04d6bed880bb0bc46999b72007f731bd3f43</originalsourceid><addsrcrecordid>eNpNkc1OwzAQhCMEEgj6AoiDJc4p6zhN4iMqv1KhFYVzFCcb6pLEwXZaeBpeFacBxGlHo5nVaj_PO6UwphT4xfPlcrYYBxCwMWMBZ3Gy5x0FTvmcQbj_qwMOh97ImDUAUIg5j8Mj72veELtCstBYyNzKjZNqi5qokszFGgfrvrFYVfJVCllJ-0ke0GqZG1IqvWsvu7_oArVz66zJsd9xhdiSqarbCj_cbDaq6qxUTVaRJ8w7rbGxZNki5ity3az6Wt1bj2i3Sr-ZE--gzCqDo5957L3cXD9P7_zZ_PZ-ejnzcxaC9eNJxkMaxkhFEiVlwSIBE5phUrAsg7CIBBZJAkKAyMOIcy7iACAuY0ZFwcqQHXvnw95Wq_cOjU3XqtPuTJMGCXf_YhOeuFQwpHKtjNFYpq2WdaY_UwppzyLdsUh7FukPC1c6G0oSEf8VHBvGIvYNrzCIwg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2890103598</pqid></control><display><type>article</type><title>On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks</title><source>IEEE Electronic Library (IEL)</source><creator>Gelderblom, Femke B. ; Tronstad, Tron Vedul ; Svendsen, Torbjorn ; Myrvoll, Tor Andre</creator><creatorcontrib>Gelderblom, Femke B. ; Tronstad, Tron Vedul ; Svendsen, Torbjorn ; Myrvoll, Tor Andre</creatorcontrib><description>Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning (DL) based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs. However, validation of the ability of these OIMs to predict subjective intelligibility when enhancing a speech signal using DL-based systems is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech enhancement systems. We found that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of 'enhanced' speech signals.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2023.3329378</identifier><identifier>CODEN: ITASFA</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Auditory system ; Channel estimation ; Intelligibility ; Measurement ; Noise measurement ; objective metrics ; Performance evaluation ; Performance prediction ; Speech ; Speech enhancement ; Speech processing ; Speech recognition ; subjective evaluation ; Testing</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2024, Vol.32, p.215-226</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c340t-75a94147e1b868fd36b051ae8d3aa04d6bed880bb0bc46999b72007f731bd3f43</citedby><cites>FETCH-LOGICAL-c340t-75a94147e1b868fd36b051ae8d3aa04d6bed880bb0bc46999b72007f731bd3f43</cites><orcidid>0000-0003-0578-7941 ; 0000-0001-6286-9148 ; 0000-0002-3329-7109 ; 0000-0002-1034-4427</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10304336$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10304336$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Gelderblom, Femke B.</creatorcontrib><creatorcontrib>Tronstad, Tron Vedul</creatorcontrib><creatorcontrib>Svendsen, Torbjorn</creatorcontrib><creatorcontrib>Myrvoll, Tor Andre</creatorcontrib><title>On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning (DL) based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs. However, validation of the ability of these OIMs to predict subjective intelligibility when enhancing a speech signal using DL-based systems is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech enhancement systems. We found that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of 'enhanced' speech signals.</description><subject>Auditory system</subject><subject>Channel estimation</subject><subject>Intelligibility</subject><subject>Measurement</subject><subject>Noise measurement</subject><subject>objective metrics</subject><subject>Performance evaluation</subject><subject>Performance prediction</subject><subject>Speech</subject><subject>Speech enhancement</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>subjective evaluation</subject><subject>Testing</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkc1OwzAQhCMEEgj6AoiDJc4p6zhN4iMqv1KhFYVzFCcb6pLEwXZaeBpeFacBxGlHo5nVaj_PO6UwphT4xfPlcrYYBxCwMWMBZ3Gy5x0FTvmcQbj_qwMOh97ImDUAUIg5j8Mj72veELtCstBYyNzKjZNqi5qokszFGgfrvrFYVfJVCllJ-0ke0GqZG1IqvWsvu7_oArVz66zJsd9xhdiSqarbCj_cbDaq6qxUTVaRJ8w7rbGxZNki5ity3az6Wt1bj2i3Sr-ZE--gzCqDo5957L3cXD9P7_zZ_PZ-ejnzcxaC9eNJxkMaxkhFEiVlwSIBE5phUrAsg7CIBBZJAkKAyMOIcy7iACAuY0ZFwcqQHXvnw95Wq_cOjU3XqtPuTJMGCXf_YhOeuFQwpHKtjNFYpq2WdaY_UwppzyLdsUh7FukPC1c6G0oSEf8VHBvGIvYNrzCIwg</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Gelderblom, Femke B.</creator><creator>Tronstad, Tron Vedul</creator><creator>Svendsen, Torbjorn</creator><creator>Myrvoll, Tor Andre</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0578-7941</orcidid><orcidid>https://orcid.org/0000-0001-6286-9148</orcidid><orcidid>https://orcid.org/0000-0002-3329-7109</orcidid><orcidid>https://orcid.org/0000-0002-1034-4427</orcidid></search><sort><creationdate>2024</creationdate><title>On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks</title><author>Gelderblom, Femke B. ; Tronstad, Tron Vedul ; Svendsen, Torbjorn ; Myrvoll, Tor Andre</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c340t-75a94147e1b868fd36b051ae8d3aa04d6bed880bb0bc46999b72007f731bd3f43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Auditory system</topic><topic>Channel estimation</topic><topic>Intelligibility</topic><topic>Measurement</topic><topic>Noise measurement</topic><topic>objective metrics</topic><topic>Performance evaluation</topic><topic>Performance prediction</topic><topic>Speech</topic><topic>Speech enhancement</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>subjective evaluation</topic><topic>Testing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gelderblom, Femke B.</creatorcontrib><creatorcontrib>Tronstad, Tron Vedul</creatorcontrib><creatorcontrib>Svendsen, Torbjorn</creatorcontrib><creatorcontrib>Myrvoll, Tor Andre</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gelderblom, Femke B.</au><au>Tronstad, Tron Vedul</au><au>Svendsen, Torbjorn</au><au>Myrvoll, Tor Andre</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2024</date><risdate>2024</risdate><volume>32</volume><spage>215</spage><epage>226</epage><pages>215-226</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASFA</coden><abstract>Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning (DL) based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs. However, validation of the ability of these OIMs to predict subjective intelligibility when enhancing a speech signal using DL-based systems is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech enhancement systems. We found that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of 'enhanced' speech signals.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2023.3329378</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-0578-7941</orcidid><orcidid>https://orcid.org/0000-0001-6286-9148</orcidid><orcidid>https://orcid.org/0000-0002-3329-7109</orcidid><orcidid>https://orcid.org/0000-0002-1034-4427</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2329-9290
ispartof IEEE/ACM transactions on audio, speech, and language processing, 2024, Vol.32, p.215-226
issn 2329-9290
2329-9304
language eng
recordid cdi_crossref_primary_10_1109_TASLP_2023_3329378
source IEEE Electronic Library (IEL)
subjects Auditory system
Channel estimation
Intelligibility
Measurement
Noise measurement
objective metrics
Performance evaluation
Performance prediction
Speech
Speech enhancement
Speech processing
Speech recognition
subjective evaluation
Testing
title On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T13%3A31%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20Predictive%20Power%20of%20Objective%20Intelligibility%20Metrics%20for%20the%20Subjective%20Performance%20of%20Deep%20Complex%20Convolutional%20Recurrent%20Speech%20Enhancement%20Networks&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Gelderblom,%20Femke%20B.&rft.date=2024&rft.volume=32&rft.spage=215&rft.epage=226&rft.pages=215-226&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASFA&rft_id=info:doi/10.1109/TASLP.2023.3329378&rft_dat=%3Cproquest_RIE%3E2890103598%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2890103598&rft_id=info:pmid/&rft_ieee_id=10304336&rfr_iscdi=true