DONUT: CTC-based Query-by-Example Keyword Spotting
Keyword spotting--or wakeword detection--is an essential feature for hands-free operation of modern voice-controlled devices. With such devices becoming ubiquitous, users might want to choose a personalized custom wakeword. In this work, we present DONUT, a CTC-based algorithm for online query-by-ex...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Keyword spotting--or wakeword detection--is an essential feature for
hands-free operation of modern voice-controlled devices. With such devices
becoming ubiquitous, users might want to choose a personalized custom wakeword.
In this work, we present DONUT, a CTC-based algorithm for online
query-by-example keyword spotting that enables custom wakeword detection. The
algorithm works by recording a small number of training examples from the user,
generating a set of label sequence hypotheses from these training examples, and
detecting the wakeword by aggregating the scores of all the hypotheses given a
new audio recording. Our method combines the generalization and
interpretability of CTC-based keyword spotting with the user-adaptation and
convenience of a conventional query-by-example system. DONUT has low
computational requirements and is well-suited for both learning and inference
on embedded systems without requiring private user data to be uploaded to the
cloud. |
---|---|
DOI: | 10.48550/arxiv.1811.10736 |