Constructing SABeD: A Spoken Academic Belgian Dutch corpus
We present the Spoken Academic Belgian Dutch (SABeD) corpus and a description of its con-struction. It was compiled from selected first bachelor academic lectures in higher educationinstitutions in Flanders, as students indicate that the language used in such lectures is one ofthe hurdles for compre...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present the Spoken Academic Belgian Dutch (SABeD) corpus and a description of its con-struction. It was compiled from selected first bachelor academic lectures in higher educationinstitutions in Flanders, as students indicate that the language used in such lectures is one ofthe hurdles for comprehension and academic success. We first applied speech recognition onthese lectures and then applied manual utterance segmentation and manual correction of the au-tomated transcription. A filtered version of the resulting transcriptions was automatically punc-tuated and linguistically annotated with CLARIN tools and is currently available for search inthe Autosearch online corpus query environment. The manual transcriptions and the ELAN fileswith the final annotation will soon be made available to the research community for download inthe CLARIN infrastructure at http://hdl.handle.net/10032/tm-a2-w4. |
---|---|
ISSN: | 1650-3740 |