On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks
Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally us...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2024, Vol.32, p.215-226 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 226 |
---|---|
container_issue | |
container_start_page | 215 |
container_title | IEEE/ACM transactions on audio, speech, and language processing |
container_volume | 32 |
creator | Gelderblom, Femke B. Tronstad, Tron Vedul Svendsen, Torbjorn Myrvoll, Tor Andre |
description | Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning (DL) based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs. However, validation of the ability of these OIMs to predict subjective intelligibility when enhancing a speech signal using DL-based systems is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech enhancement systems. We found that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of 'enhanced' speech signals. |
doi_str_mv | 10.1109/TASLP.2023.3329378 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TASLP_2023_3329378</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10304336</ieee_id><sourcerecordid>2890103598</sourcerecordid><originalsourceid>FETCH-LOGICAL-c340t-75a94147e1b868fd36b051ae8d3aa04d6bed880bb0bc46999b72007f731bd3f43</originalsourceid><addsrcrecordid>eNpNkc1OwzAQhCMEEgj6AoiDJc4p6zhN4iMqv1KhFYVzFCcb6pLEwXZaeBpeFacBxGlHo5nVaj_PO6UwphT4xfPlcrYYBxCwMWMBZ3Gy5x0FTvmcQbj_qwMOh97ImDUAUIg5j8Mj72veELtCstBYyNzKjZNqi5qokszFGgfrvrFYVfJVCllJ-0ke0GqZG1IqvWsvu7_oArVz66zJsd9xhdiSqarbCj_cbDaq6qxUTVaRJ8w7rbGxZNki5ity3az6Wt1bj2i3Sr-ZE--gzCqDo5957L3cXD9P7_zZ_PZ-ejnzcxaC9eNJxkMaxkhFEiVlwSIBE5phUrAsg7CIBBZJAkKAyMOIcy7iACAuY0ZFwcqQHXvnw95Wq_cOjU3XqtPuTJMGCXf_YhOeuFQwpHKtjNFYpq2WdaY_UwppzyLdsUh7FukPC1c6G0oSEf8VHBvGIvYNrzCIwg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2890103598</pqid></control><display><type>article</type><title>On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks</title><source>IEEE Electronic Library (IEL)</source><creator>Gelderblom, Femke B. ; Tronstad, Tron Vedul ; Svendsen, Torbjorn ; Myrvoll, Tor Andre</creator><creatorcontrib>Gelderblom, Femke B. ; Tronstad, Tron Vedul ; Svendsen, Torbjorn ; Myrvoll, Tor Andre</creatorcontrib><description>Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning (DL) based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs. However, validation of the ability of these OIMs to predict subjective intelligibility when enhancing a speech signal using DL-based systems is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech enhancement systems. We found that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of 'enhanced' speech signals.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2023.3329378</identifier><identifier>CODEN: ITASFA</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Auditory system ; Channel estimation ; Intelligibility ; Measurement ; Noise measurement ; objective metrics ; Performance evaluation ; Performance prediction ; Speech ; Speech enhancement ; Speech processing ; Speech recognition ; subjective evaluation ; Testing</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2024, Vol.32, p.215-226</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c340t-75a94147e1b868fd36b051ae8d3aa04d6bed880bb0bc46999b72007f731bd3f43</citedby><cites>FETCH-LOGICAL-c340t-75a94147e1b868fd36b051ae8d3aa04d6bed880bb0bc46999b72007f731bd3f43</cites><orcidid>0000-0003-0578-7941 ; 0000-0001-6286-9148 ; 0000-0002-3329-7109 ; 0000-0002-1034-4427</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10304336$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10304336$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Gelderblom, Femke B.</creatorcontrib><creatorcontrib>Tronstad, Tron Vedul</creatorcontrib><creatorcontrib>Svendsen, Torbjorn</creatorcontrib><creatorcontrib>Myrvoll, Tor Andre</creatorcontrib><title>On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning (DL) based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs. However, validation of the ability of these OIMs to predict subjective intelligibility when enhancing a speech signal using DL-based systems is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech enhancement systems. We found that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of 'enhanced' speech signals.</description><subject>Auditory system</subject><subject>Channel estimation</subject><subject>Intelligibility</subject><subject>Measurement</subject><subject>Noise measurement</subject><subject>objective metrics</subject><subject>Performance evaluation</subject><subject>Performance prediction</subject><subject>Speech</subject><subject>Speech enhancement</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>subjective evaluation</subject><subject>Testing</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkc1OwzAQhCMEEgj6AoiDJc4p6zhN4iMqv1KhFYVzFCcb6pLEwXZaeBpeFacBxGlHo5nVaj_PO6UwphT4xfPlcrYYBxCwMWMBZ3Gy5x0FTvmcQbj_qwMOh97ImDUAUIg5j8Mj72veELtCstBYyNzKjZNqi5qokszFGgfrvrFYVfJVCllJ-0ke0GqZG1IqvWsvu7_oArVz66zJsd9xhdiSqarbCj_cbDaq6qxUTVaRJ8w7rbGxZNki5ity3az6Wt1bj2i3Sr-ZE--gzCqDo5957L3cXD9P7_zZ_PZ-ejnzcxaC9eNJxkMaxkhFEiVlwSIBE5phUrAsg7CIBBZJAkKAyMOIcy7iACAuY0ZFwcqQHXvnw95Wq_cOjU3XqtPuTJMGCXf_YhOeuFQwpHKtjNFYpq2WdaY_UwppzyLdsUh7FukPC1c6G0oSEf8VHBvGIvYNrzCIwg</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Gelderblom, Femke B.</creator><creator>Tronstad, Tron Vedul</creator><creator>Svendsen, Torbjorn</creator><creator>Myrvoll, Tor Andre</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0578-7941</orcidid><orcidid>https://orcid.org/0000-0001-6286-9148</orcidid><orcidid>https://orcid.org/0000-0002-3329-7109</orcidid><orcidid>https://orcid.org/0000-0002-1034-4427</orcidid></search><sort><creationdate>2024</creationdate><title>On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks</title><author>Gelderblom, Femke B. ; Tronstad, Tron Vedul ; Svendsen, Torbjorn ; Myrvoll, Tor Andre</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c340t-75a94147e1b868fd36b051ae8d3aa04d6bed880bb0bc46999b72007f731bd3f43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Auditory system</topic><topic>Channel estimation</topic><topic>Intelligibility</topic><topic>Measurement</topic><topic>Noise measurement</topic><topic>objective metrics</topic><topic>Performance evaluation</topic><topic>Performance prediction</topic><topic>Speech</topic><topic>Speech enhancement</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>subjective evaluation</topic><topic>Testing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gelderblom, Femke B.</creatorcontrib><creatorcontrib>Tronstad, Tron Vedul</creatorcontrib><creatorcontrib>Svendsen, Torbjorn</creatorcontrib><creatorcontrib>Myrvoll, Tor Andre</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gelderblom, Femke B.</au><au>Tronstad, Tron Vedul</au><au>Svendsen, Torbjorn</au><au>Myrvoll, Tor Andre</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2024</date><risdate>2024</risdate><volume>32</volume><spage>215</spage><epage>226</epage><pages>215-226</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASFA</coden><abstract>Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning (DL) based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs. However, validation of the ability of these OIMs to predict subjective intelligibility when enhancing a speech signal using DL-based systems is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech enhancement systems. We found that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of 'enhanced' speech signals.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2023.3329378</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-0578-7941</orcidid><orcidid>https://orcid.org/0000-0001-6286-9148</orcidid><orcidid>https://orcid.org/0000-0002-3329-7109</orcidid><orcidid>https://orcid.org/0000-0002-1034-4427</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2329-9290 |
ispartof | IEEE/ACM transactions on audio, speech, and language processing, 2024, Vol.32, p.215-226 |
issn | 2329-9290 2329-9304 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TASLP_2023_3329378 |
source | IEEE Electronic Library (IEL) |
subjects | Auditory system Channel estimation Intelligibility Measurement Noise measurement objective metrics Performance evaluation Performance prediction Speech Speech enhancement Speech processing Speech recognition subjective evaluation Testing |
title | On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T13%3A31%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20Predictive%20Power%20of%20Objective%20Intelligibility%20Metrics%20for%20the%20Subjective%20Performance%20of%20Deep%20Complex%20Convolutional%20Recurrent%20Speech%20Enhancement%20Networks&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Gelderblom,%20Femke%20B.&rft.date=2024&rft.volume=32&rft.spage=215&rft.epage=226&rft.pages=215-226&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASFA&rft_id=info:doi/10.1109/TASLP.2023.3329378&rft_dat=%3Cproquest_RIE%3E2890103598%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2890103598&rft_id=info:pmid/&rft_ieee_id=10304336&rfr_iscdi=true |