Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device
Streaming end-to-end speech recognition models have been widely applied to mobile devices and show significant improvement in efficiency. These models are typically trained on the server using transcribed speech data. However, the server data distribution can be very different from the data distribu...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Streaming end-to-end speech recognition models have been widely applied to
mobile devices and show significant improvement in efficiency. These models are
typically trained on the server using transcribed speech data. However, the
server data distribution can be very different from the data distribution on
user devices, which could affect the model performance. There are two main
challenges for on device training, limited reliable labels and limited training
memory. While self-supervised learning algorithms can mitigate the mismatch
between domains using unlabeled data, they are not applicable on mobile devices
directly because of the memory constraint. In this paper, we propose an
incremental layer-wise self-supervised learning algorithm for efficient speech
domain adaptation on mobile devices, in which only one layer is updated at a
time. Extensive experimental results demonstrate that the proposed algorithm
obtains a Word Error Rate (WER) on the target domain $24.2\%$ better than
supervised baseline and costs $89.7\%$ less training memory than the end-to-end
self-supervised learning algorithm. |
---|---|
DOI: | 10.48550/arxiv.2110.00155 |