Morphosyntactic Annotation in the Technical Corpus of Galician
The Corpus Tecnico do Galego ([CTG] Technical Corpus of Galician), developed at the U of Vigo in collaboration with the U of Santiago de Compostela & available on the Internet (www.sli.uvigo.es/CTG) for free consultation since 2006, is presented, specifying its content & size & reporting...
Gespeichert in:
Veröffentlicht in: | Linguamática (Braga, Portugal) Portugal), 2009-01, Vol.1, p.61-70 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | orm |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Corpus Tecnico do Galego ([CTG] Technical Corpus of Galician), developed at the U of Vigo in collaboration with the U of Santiago de Compostela & available on the Internet (www.sli.uvigo.es/CTG) for free consultation since 2006, is presented, specifying its content & size & reporting tagging & lemmatization work; the Corpus Tecnico Anotado do Galego ([CTAG] Annotated Technical Corpus of Galician) is discussed as a categorized & lemmatized version of CTG. Geoffrey Leech & Andrew Wilson's (1996) guidelines for morphosyntactic annotation were applied, incorporating proposals by Montserrat Civit (2003) for Spanish. Form-lemma-tag samples are produced for nouns, verbs, adjectives, & other parts of speech. The treatment of standard & nonstandard word forms, lexemes nonconforming to normative orthography, loanwords, abbreviations, & symbols is discussed. Samples of tagged text fragments are also included. Adapted from the source document |
---|---|
ISSN: | 1647-0818 1647-0818 |