Sampling-Based Pruned Knowledge Distillation for Training Lightweight RNN-T

We present a novel training method for small-scale RNN-T models, widely used in real-world speech recognition applications. Despite efforts to scale down models for edge devices, the demand for even smaller and more compact speech recognition models persists to accommodate a broader range of devices...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE signal processing letters 2025, Vol.32, p.631-635
Hauptverfasser:	Kim, Sungsoo, Lee, Dongjune, Kang, Ju Yeon, Jeong, Myeonghun, Kim, Nam Soo
Format:	Artikel
Sprache:	eng
Schlagworte:	Complexity theory Computational modeling Knowledge distillation Knowledge management Lattices Lightweight Memory management Performance evaluation Recurrent neural networks RNN-T Sampling Speech recognition Teachers Training Transducers Vectors Visualization Weight reduction
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We present a novel training method for small-scale RNN-T models, widely used in real-world speech recognition applications. Despite efforts to scale down models for edge devices, the demand for even smaller and more compact speech recognition models persists to accommodate a broader range of devices. In this letter, we propose Sampling-based Pruned Knowledge Distillation (SP-KD) for training lightweight RNN-T models. In contrast to the conventional knowledge distillation techniques, the proposed method enables student models to distill knowledge from the distribution of teacher models, which is estimated by considering not only the best paths but also less likely paths. Additionally, we leverage pruning the output lattice of RNN-T to comprehensively transfer knowledge from teacher models to student models. Experimental results demonstrate that our proposed method outperforms the baseline in training tiny RNN-T models.
ISSN:	1070-9908 1558-2361
DOI:	10.1109/LSP.2025.3528364