Mitigating Information Interruptions by COVID-19 Face Masks: A Three-Stage Speech Enhancement Scheme

The coronavirus disease 2019 (COVID-19) preventive measures have resulted in significant lifestyle changes. One of the COVID-19 new normal is the usage of face masks for protection against airborne aerosol which creates distractions and interruptions in voice communication. It has a different influe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computational social systems 2024-08, Vol.11 (4), p.4790-4799
Hauptverfasser:	Dash, Tusar Kanti, Chakraborty, Chinmay, Mahapatra, Satyajit, Panda, Ganapati
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Coronavirus disease 2019 (COVID-19) Coronaviruses COVID-19 face mask Face recognition Faces Feature extraction Frequencies gray wolf optimizer (GWO) information interruptions Masks Neural networks Noise measurement Parameter modification Q-factor Radial basis function Signal quality Signal to noise ratio Speech Speech enhancement speech enhancement (SE) Speech processing Speech recognition Subtraction tunable Q-factor wavelet transform (TQWT) Viral diseases Voice communication Wavelet transforms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The coronavirus disease 2019 (COVID-19) preventive measures have resulted in significant lifestyle changes. One of the COVID-19 new normal is the usage of face masks for protection against airborne aerosol which creates distractions and interruptions in voice communication. It has a different influence on speech than the standard concept of noise affecting speech communication. Furthermore, it has varied effects on speech in different frequency bands. To provide a solution to this problem, a three-stage adaptive speech enhancement (SE) scheme is developed in this article. In the first stage, the tunable Q -factor wavelet transform (TQWT) features are extracted by properly setting the quality factor values and the number of levels from the input speech signal. In the second stage, the adjustable parameters of the preemphasis filter and modified multiband spectral subtraction (MBSS) are determined using bio-inspired techniques for different masking and signal-to-noise ratio (SNR) conditions. In the third stage, the weights, center values, standard deviation of the Gaussian radial basis functions, and input patterns of the radial basis function neural networks (RBFNNs) are updated to predict the optimized parameters from the input TQWT-based cepstral features (TQCFs). In the end, the performance of the proposed algorithm is compared with the standard SE algorithms using two speech datasets.
ISSN:	2329-924X 2373-7476
DOI:	10.1109/TCSS.2022.3210988