End-to-end Speech-to-Punctuated-Text Recognition
Conventional automatic speech recognition systems do not produce punctuation marks which are important for the readability of the speech recognition results. They are also needed for subsequent natural language processing tasks such as machine translation. There have been a lot of works on punctuati...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Conventional automatic speech recognition systems do not produce punctuation
marks which are important for the readability of the speech recognition
results. They are also needed for subsequent natural language processing tasks
such as machine translation. There have been a lot of works on punctuation
prediction models that insert punctuation marks into speech recognition results
as post-processing. However, these studies do not utilize acoustic information
for punctuation prediction and are directly affected by speech recognition
errors. In this study, we propose an end-to-end model that takes speech as
input and outputs punctuated texts. This model is expected to predict
punctuation robustly against speech recognition errors while using acoustic
information. We also propose to incorporate an auxiliary loss to train the
model using the output of the intermediate layer and unpunctuated texts.
Through experiments, we compare the performance of the proposed model to that
of a cascaded system. The proposed model achieves higher punctuation prediction
accuracy than the cascaded system without sacrificing the speech recognition
error rate. It is also demonstrated that the multi-task learning using the
intermediate output against the unpunctuated text is effective. Moreover, the
proposed model has only about 1/7th of the parameters compared to the cascaded
system. |
---|---|
DOI: | 10.48550/arxiv.2207.03169 |