Improving Phrase Chunking by using Contextualized Word Embeddings for a Morphologically Rich Language
Phrase chunking is an important task in various natural language processing (NLP) applications. This paper presents a neural phrase chunking for Urdu by training contextualized word representations. This work also produces an annotated corpus. The annotation has been performed by using IOB (inside-o...
Gespeichert in:
Veröffentlicht in: | Arabian journal for science and engineering (2011) 2022-08, Vol.47 (8), p.9781-9799 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Phrase chunking is an important task in various natural language processing (NLP) applications. This paper presents a neural phrase chunking for Urdu by training contextualized word representations. This work also produces an annotated corpus. The annotation has been performed by using IOB (inside-outside-begin) labels. Comprehensive guidelines have been developed for four phrases which are noun phrase (NP), verb phrase (VP), post-positional phrase (PP) and prepositional phrase (PRP). The annotated text has been evaluated for completeness and correctness automatically. Inter-annotator agreement has been calculated for ten percent reference corpus. A neural chunker has been developed and trained on the annotated corpus. The chunker is based on long–short- term memory networks. Transfer learning has been employed to improve the chunking results. For that purpose, context-free (Word2Vec) and contextualized (ELMo) word representations have been trained. The chunker performed with an f-score of 94.9 when trained by using third layer of ELMo embeddings. |
---|---|
ISSN: | 2193-567X 1319-8025 2191-4281 |
DOI: | 10.1007/s13369-021-06343-7 |