Knowledge Transfer for Efficient On-device False Trigger Mitigation
In this paper, we address the task of determining whether a given utterance is directed towards a voice-enabled smart-assistant device or not. An undirected utterance is termed as a "false trigger" and false trigger mitigation (FTM) is essential for designing a privacy-centric non-intrusiv...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we address the task of determining whether a given utterance
is directed towards a voice-enabled smart-assistant device or not. An
undirected utterance is termed as a "false trigger" and false trigger
mitigation (FTM) is essential for designing a privacy-centric non-intrusive
smart assistant. The directedness of an utterance can be identified by running
automatic speech recognition (ASR) on it and determining the user intent by
analyzing the ASR transcript. But in case of a false trigger, transcribing the
audio using ASR itself is strongly undesirable. To alleviate this issue, we
propose an LSTM-based FTM architecture which determines the user intent from
acoustic features directly without explicitly generating ASR transcripts from
the audio. The proposed models are small footprint and can be run on-device
with limited computational resources. During training, the model parameters are
optimized using a knowledge transfer approach where a more accurate
self-attention graph neural network model serves as the teacher. Given the
whole audio snippets, our approach mitigates 87% of false triggers at 99% true
positive rate (TPR), and in a streaming audio scenario, the system listens to
only 1.69s of the false trigger audio before rejecting it while achieving the
same TPR. |
---|---|
DOI: | 10.48550/arxiv.2010.10591 |