Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach
The remarkable performance of the pre-trained language model (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. In line with these changes, leveraging the performance of speech recognition systems with massive deep learning-based LMs is...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The remarkable performance of the pre-trained language model (LM) using
self-supervised learning has led to a major paradigm shift in the study of
natural language processing. In line with these changes, leveraging the
performance of speech recognition systems with massive deep learning-based LMs
is a major topic of speech recognition research. Among the various methods of
applying LMs to speech recognition systems, in this paper, we focus on a
cross-modal knowledge distillation method that transfers knowledge between two
types of deep neural networks with different modalities. We propose an acoustic
model structure with multiple auxiliary output layers for cross-modal
distillation and demonstrate that the proposed method effectively compensates
for the shortcomings of the existing label-interpolation-based distillation
method. In addition, we extend the proposed method to a hierarchical
distillation method using LMs trained in different units (senones, monophones,
and subwords) and reveal the effectiveness of the hierarchical distillation
method through an ablation study. |
---|---|
DOI: | 10.48550/arxiv.2110.10429 |