Automatic transcription of the Polish newsreel

This paper describes an automatic transcription system for the Polish Newsreel, which is a collection of mid to late 20th century news segments presented in audio and video form. They are characterized by their use of archaic language and poor audio quality, which makes them a demanding problem for...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Poznan Studies in Contemporary Linguistics 2019-06, Vol.55 (2), p.183-209
Hauptverfasser: Koržinek, Danijel, Wołk, Krzysztof, Brocki, Łukasz, Marasek, Krzysztof
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper describes an automatic transcription system for the Polish Newsreel, which is a collection of mid to late 20th century news segments presented in audio and video form. They are characterized by their use of archaic language and poor audio quality, which makes them a demanding problem for speech recognition systems. Acoustic and language models had to be retrained using data from in-domain corpora. During the adaptation of the models, experiments were carried out to select optimal adaptation parameters. The experiments showed that the adaptation of the speech recognition system to a narrow and clearly defined domain significantly increases its efficiency. The final word error rate obtained for this domain was 10.97%.
ISSN:0137-2459
1897-7499
DOI:10.1515/psicl-2019-0008