iPhonMatchNet: Zero-Shot User-Defined Keyword Spotting Using Implicit Acoustic Echo Cancellation
In response to the increasing interest in human--machine communication across various domains, this paper introduces a novel approach called iPhonMatchNet, which addresses the challenge of barge-in scenarios, wherein user speech overlaps with device playback audio, thereby creating a self-referencin...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In response to the increasing interest in human--machine communication across
various domains, this paper introduces a novel approach called iPhonMatchNet,
which addresses the challenge of barge-in scenarios, wherein user speech
overlaps with device playback audio, thereby creating a self-referencing
problem. The proposed model leverages implicit acoustic echo cancellation
(iAEC) techniques to increase the efficiency of user-defined keyword spotting
models, achieving a remarkable 95% reduction in mean absolute error with a
minimal increase in model size (0.13%) compared to the baseline model,
PhonMatchNet. We also present an efficient model structure and demonstrate its
capability to learn iAEC functionality without requiring a clean signal. The
findings of our study indicate that the proposed model achieves competitive
performance in real-world deployment conditions of smart devices. |
---|---|
DOI: | 10.48550/arxiv.2309.06096 |