Web-Based Sources for an Annotated Corpus Building and Composite Proper Name Identification

Nowadays, collections of texts with annotations on several levels are useful resources. Huge efforts are required to develop this resource for languages like Spanish. In this work, we present the initial step, lexical level annotation, for the compilation of an annotated Mexican corpus using Web-bas...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Galicia-Haro, Sofía N., Gelbukh, Alexander, Bolshakov, Igor A.
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Annotate Corpus Applied sciences Artificial intelligence Computational Linguistics Computer science control theory systems Computer systems and distributed systems. User interface Exact sciences and technology Learning and adaptive systems Name Entity Recognition Natural Language Processing Prepositional Phrase Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Nowadays, collections of texts with annotations on several levels are useful resources. Huge efforts are required to develop this resource for languages like Spanish. In this work, we present the initial step, lexical level annotation, for the compilation of an annotated Mexican corpus using Web-based sources. We also describe a method based on heterogeneous knowledge and simple Web-based sources for the proper name identification required in such annotation. We focused our work on composite entities (names with coordinated constituents, names with several prepositional phrases, and names of songs, books, movies, etc.). The preliminary obtained results are presented.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-540-24681-7_14