On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks

Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally us...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2024, Vol.32, p.215-226
Hauptverfasser: Gelderblom, Femke B., Tronstad, Tron Vedul, Svendsen, Torbjorn, Myrvoll, Tor Andre
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning (DL) based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs. However, validation of the ability of these OIMs to predict subjective intelligibility when enhancing a speech signal using DL-based systems is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech enhancement systems. We found that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of 'enhanced' speech signals.
ISSN:2329-9290
2329-9304
DOI:10.1109/TASLP.2023.3329378