Phase-aware subspace decomposition for single channel speech separation

Single channel speech separation (SCSS) is often required as post-processing in several applications that facilitate automatic human-to-human or human-to-machine communication in challenging acoustic environments such as voice command for smart homes or robotics. The proposed SCSS system, that the a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IET signal processing 2020-06, Vol.14 (4), p.214-222
Hauptverfasser:	Wiem, Belhedi, Mohamed Anouar, Ben Messaoud, Aïcha, Bouzid
Format:	Artikel
Sprache:	eng
Schlagworte:	acoustic signal processing adaptive thresholding audio signal processing blind source separation challenging acoustic environments concurrent speech final subspace recovery human‐to‐machine communication iterative decomposition iterative methods low‐rank subspace minimisation phase‐aware mask phase‐aware subspace decomposition phase‐information Research Article robotics SCSS system separated signals separation results single channel speech separation smart homes source separation sparse rank subspace speech distortion speech enhancement speech processing speech recognition speech separation systems
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Single channel speech separation (SCSS) is often required as post-processing in several applications that facilitate automatic human-to-human or human-to-machine communication in challenging acoustic environments such as voice command for smart homes or robotics. The proposed SCSS system, that the authors call phase-aware subspace decomposition (PASD), relies on subspace decomposition for speech separation followed by a phase-aware mask for final subspace recovery. In fact, the proposed approach decomposes the mixture into a sparse and low-rank subspace in the frequency domain by rank minimising that relies on iterative decomposition using adaptive thresholding in each iteration to achieve soft estimation and considers phase-information for reconstruction. Separation results are reported in terms of both intrusive and non-intrusive metrics using realistic recordings corrupted with real-life noises. As speech separation systems are expected to have maximal interference rejection without speech distortion, we also evaluate the proposed system by recognising speech from a target speaker in the presence of either concurrent speech or noise. Recognition results show that separated signals are of high intelligibility so that they can be exploited by other automatic applications. The extensive evaluation under different test scenarios proves that PASD consistently improves the quality of the separated signals, compared to other benchmark approaches.
ISSN:	1751-9675 1751-9683 1751-9683
DOI:	10.1049/iet-spr.2019.0373