Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data
Few-shot keyword spotting (FS-KWS) models usually require large-scale annotated datasets to generalize to unseen target keywords. However, existing KWS datasets are limited in scale and gathering keyword-like labeled data is costly undertaking. To mitigate this issue, we propose a framework that use...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Few-shot keyword spotting (FS-KWS) models usually require large-scale
annotated datasets to generalize to unseen target keywords. However, existing
KWS datasets are limited in scale and gathering keyword-like labeled data is
costly undertaking. To mitigate this issue, we propose a framework that uses
easily collectible, unlabeled reading speech data as an auxiliary source.
Self-supervised learning has been widely adopted for learning representations
from unlabeled data; however, it is known to be suitable for large models with
enough capacity and is not practical for training a small footprint FS-KWS
model. Instead, we automatically annotate and filter the data to construct a
keyword-like dataset, LibriWord, enabling supervision on auxiliary data. We
then adopt multi-task learning that helps the model to enhance the
representation power from out-of-domain auxiliary data. Our method notably
improves the performance over competitive methods in the FS-KWS benchmark. |
---|---|
DOI: | 10.48550/arxiv.2309.00647 |