Constructing SABeD: A Spoken Academic Belgian Dutch corpus

We present the Spoken Academic Belgian Dutch (SABeD) corpus and a description of its con-struction. It was compiled from selected first bachelor academic lectures in higher educationinstitutions in Flanders, as students indicate that the language used in such lectures is one ofthe hurdles for compre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Mathysen, Jolien, Vandeghinste, Vincent, Peters, Elke, Wambacq, Patrick
Format: Tagungsbericht
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We present the Spoken Academic Belgian Dutch (SABeD) corpus and a description of its con-struction. It was compiled from selected first bachelor academic lectures in higher educationinstitutions in Flanders, as students indicate that the language used in such lectures is one ofthe hurdles for comprehension and academic success. We first applied speech recognition onthese lectures and then applied manual utterance segmentation and manual correction of the au-tomated transcription. A filtered version of the resulting transcriptions was automatically punc-tuated and linguistically annotated with CLARIN tools and is currently available for search inthe Autosearch online corpus query environment. The manual transcriptions and the ELAN fileswith the final annotation will soon be made available to the research community for download inthe CLARIN infrastructure at http://hdl.handle.net/10032/tm-a2-w4.
ISSN:1650-3740