Dealing With Complex Linguistic Annotations Within a Language Processing Framework

In this paper we present AWA, a general purpose Annotation Web Architecture for representing, storing, and accessing the information produced by different linguistic processors. The objective of AWA is to establish a coherent and flexible representation scheme that will be the basis for the exchange...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2009-07, Vol.17 (5), p.904-915
Hauptverfasser: Artola, Xabier, de Ilarraza, Arantza DÍaz, Soroa, Aitor, Sologaistoa, Aitor
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper we present AWA, a general purpose Annotation Web Architecture for representing, storing, and accessing the information produced by different linguistic processors. The objective of AWA is to establish a coherent and flexible representation scheme that will be the basis for the exchange and use of linguistic information. In morphologically-rich languages as Basque it is necessary to represent and provide easy access to complex phenomena such as intraword structure, declension, derivation and composition features, constituent discontinousness (in multiword expressions) and so on. AWA provides a well-suited schema to deal with these phenomena. The annotation model relies on XML technologies for data representation, storage and retrieval. Typed feature structures are used as a representation schema for linguistic analyses. A consistent underlying data model, which captures the structure and relations contained in the information to be manipulated, has been identified and implemented. AWA is integrated into LPAF, a multilayered Language Processing and Annotation Framework, whose goal is the management and integration of diverse NLP components and resources. Moreover, we introduce EULIA, an annotation tool which exploits and manipulates the data created by the linguistic processors. Two real corpora have been processed and annotated within this framework.
ISSN:1558-7916
2329-9290
1558-7924
2329-9304
DOI:10.1109/TASL.2009.2018565