Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript
Recent years have witnessed significant improvement in ASR systems to recognize spoken utterances. However, it is still a challenging task for noisy and out-of-domain data, where substitution and deletion errors are prevalent in the transcribed text. These errors significantly degrade the performanc...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent years have witnessed significant improvement in ASR systems to
recognize spoken utterances. However, it is still a challenging task for noisy
and out-of-domain data, where substitution and deletion errors are prevalent in
the transcribed text. These errors significantly degrade the performance of
downstream tasks. In this work, we propose a BERT-style language model,
referred to as PhonemeBERT, that learns a joint language model with phoneme
sequence and ASR transcript to learn phonetic-aware representations that are
robust to ASR errors. We show that PhonemeBERT can be used on downstream tasks
using phoneme sequences as additional features, and also in low-resource setup
where we only have ASR-transcripts for the downstream tasks with no phoneme
information available. We evaluate our approach extensively by generating noisy
data for three benchmark datasets - Stanford Sentiment Treebank, TREC and ATIS
for sentiment, question and intent classification tasks respectively. The
results of the proposed approach beats the state-of-the-art baselines
comprehensively on each dataset. |
---|---|
DOI: | 10.48550/arxiv.2102.00804 |