Leveraging Non-Causal Knowledge via Cross-Network Knowledge Distillation for Real-Time Speech Enhancement

To improve real-time speech enhancement (SE) while maintaining efficiency, researchers have adopted knowledge distillation (KD). However, when the same network type as the real-time SE student model is used as a teacher model, the performance of the teacher model can be unsatisfactory, thereby limit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE signal processing letters 2024, Vol.31, p.1129-1133
Hauptverfasser: Park, Hyun Joon, Shin, Wooseok, Kim, Jin Sob, Han, Sung Won
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:To improve real-time speech enhancement (SE) while maintaining efficiency, researchers have adopted knowledge distillation (KD). However, when the same network type as the real-time SE student model is used as a teacher model, the performance of the teacher model can be unsatisfactory, thereby limiting the effectiveness of KD. To overcome this limitation, we propose cross-network non-causal knowledge distillation (CNNC-Distill). CNNC-Distill enables knowledge transfer between networks of different types, allowing the use of a teacher model with a different network type compared to the real-time SE student model. To maximize the KD effect, a non-real-time SE model unconstrained by causality conditions is adopted as the teacher model. CNNC-Distill transfers the non-causal knowledge of the non-real-time SE teacher model to a real-time SE student model using feature and output distillation. We also introduce a time-domain network, RT-SENet, used as the real-time SE student model. Results on the Valentini dataset show the efficiency of RT-SENet and the significant performance improvement achieved by CNNC-Distill.
ISSN:1070-9908
1558-2361
DOI:10.1109/LSP.2024.3388956