Improving Phrase Chunking by using Contextualized Word Embeddings for a Morphologically Rich Language

Phrase chunking is an important task in various natural language processing (NLP) applications. This paper presents a neural phrase chunking for Urdu by training contextualized word representations. This work also produces an annotated corpus. The annotation has been performed by using IOB (inside-o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Arabian journal for science and engineering (2011) 2022-08, Vol.47 (8), p.9781-9799
Hauptverfasser:	Ehsan, Toqeer, Khalid, Javairia, Ambreen, Saadia, Mustafa, Asad, Hussain, Sarmad
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Engineering Humanities and Social Sciences multidisciplinary Natural language processing Representations Research Article-Computer Engineering and Computer Science Science
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Phrase chunking is an important task in various natural language processing (NLP) applications. This paper presents a neural phrase chunking for Urdu by training contextualized word representations. This work also produces an annotated corpus. The annotation has been performed by using IOB (inside-outside-begin) labels. Comprehensive guidelines have been developed for four phrases which are noun phrase (NP), verb phrase (VP), post-positional phrase (PP) and prepositional phrase (PRP). The annotated text has been evaluated for completeness and correctness automatically. Inter-annotator agreement has been calculated for ten percent reference corpus. A neural chunker has been developed and trained on the annotated corpus. The chunker is based on long–short- term memory networks. Transfer learning has been employed to improve the chunking results. For that purpose, context-free (Word2Vec) and contextualized (ELMo) word representations have been trained. The chunker performed with an f-score of 94.9 when trained by using third layer of ELMo embeddings.
ISSN:	2193-567X 1319-8025 2191-4281
DOI:	10.1007/s13369-021-06343-7