Are Soft Prompts Good Zero-shot Learners for Speech Recognition?
Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. Howe...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large self-supervised pre-trained speech models require computationally
expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple
parameter-efficient alternative by utilizing minimal soft prompt guidance,
enhancing portability while also maintaining competitive performance. However,
not many people understand how and why this is so. In this study, we aim to
deepen our understanding of this emerging method by investigating the role of
soft prompts in automatic speech recognition (ASR). Our findings highlight
their role as zero-shot learners in improving ASR performance but also make
them vulnerable to malicious modifications. Soft prompts aid generalization but
are not obligatory for inference. We also identify two primary roles of soft
prompts: content refinement and noise information enhancement, which enhances
robustness against background noise. Additionally, we propose an effective
modification on noise prompts to show that they are capable of zero-shot
learning on adapting to out-of-distribution noise environments. |
---|---|
DOI: | 10.48550/arxiv.2309.09413 |