Broadcasted Residual Learning for Efficient Keyword Spotting
Keyword spotting is an important research field because it plays a key role in device wake-up and user interaction on smart devices. However, it is challenging to minimize errors while operating efficiently in devices with limited resources such as mobile phones. We present a broadcasted residual le...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Keyword spotting is an important research field because it plays a key role
in device wake-up and user interaction on smart devices. However, it is
challenging to minimize errors while operating efficiently in devices with
limited resources such as mobile phones. We present a broadcasted residual
learning method to achieve high accuracy with small model size and
computational load. Our method configures most of the residual functions as 1D
temporal convolution while still allows 2D convolution together using a
broadcasted-residual connection that expands temporal output to
frequency-temporal dimension. This residual mapping enables the network to
effectively represent useful audio features with much less computation than
conventional convolutional neural networks. We also propose a novel network
architecture, Broadcasting-residual network (BC-ResNet), based on broadcasted
residual learning and describe how to scale up the model according to the
target device's resources. BC-ResNets achieve state-of-the-art 98.0% and 98.7%
top-1 accuracy on Google speech command datasets v1 and v2, respectively, and
consistently outperform previous approaches, using fewer computations and
parameters. Code is available at
https://github.com/Qualcomm-AI-research/bcresnet. |
---|---|
DOI: | 10.48550/arxiv.2106.04140 |