Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Automatic Speech Recognition (ASR) has seen remarkable advancements with deep neural networks, such as Transformer and Conformer. However, these models typically have large model sizes and high inference costs, posing a challenge to deploy on resource-limited devices. In this paper, we propose a nov...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Automatic Speech Recognition (ASR) has seen remarkable advancements with deep
neural networks, such as Transformer and Conformer. However, these models
typically have large model sizes and high inference costs, posing a challenge
to deploy on resource-limited devices. In this paper, we propose a novel
compression strategy that leverages structured pruning and knowledge
distillation to reduce the model size and inference cost of the Conformer model
while preserving high recognition performance. Our approach utilizes a set of
binary masks to indicate whether to retain or prune each Conformer module, and
employs L0 regularization to learn the optimal mask values. To further enhance
pruning performance, we use a layerwise distillation strategy to transfer
knowledge from unpruned to pruned models. Our method outperforms all pruning
baselines on the widely used LibriSpeech benchmark, achieving a 50% reduction
in model size and a 28% reduction in inference cost with minimal performance
loss. |
---|---|
DOI: | 10.48550/arxiv.2305.19549 |