Efficient keyword spotting using time delay neural networks
This paper describes a novel method of live keyword spotting using a two-stage time delay neural network. The model is trained using transfer learning: initial training with phone targets from a large speech corpus is followed by training with keyword targets from a smaller data set. The accuracy of...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper describes a novel method of live keyword spotting using a
two-stage time delay neural network. The model is trained using transfer
learning: initial training with phone targets from a large speech corpus is
followed by training with keyword targets from a smaller data set. The accuracy
of the system is evaluated on two separate tasks. The first is the freely
available Google Speech Commands dataset. The second is an in-house task
specifically developed for keyword spotting. The results show significant
improvements in false accept and false reject rates in both clean and noisy
environments when compared with previously known techniques. Furthermore, we
investigate various techniques to reduce computation in terms of
multiplications per second of audio. Compared to recently published work, the
proposed system provides up to 89% savings on computational complexity. |
---|---|
DOI: | 10.48550/arxiv.1807.04353 |