Comparative study of text representation and learning for Persian named entity recognition

Transformer models have had a great impact on natural language processing (NLP) in recent years by realizing outstanding and efficient contextualized language models. Recent studies have used transformer‐based language models for various NLP tasks, including Persian named entity recognition (NER). H...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ETRI journal 2022-10, Vol.44 (5), p.794-804
Hauptverfasser:	Abdollah Pour, Mohammad Mahdi, Momtazi, Saeedeh
Format:	Artikel
Sprache:	eng
Schlagworte:	contextualized representation NER Persian language processing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Transformer models have had a great impact on natural language processing (NLP) in recent years by realizing outstanding and efficient contextualized language models. Recent studies have used transformer‐based language models for various NLP tasks, including Persian named entity recognition (NER). However, in complex tasks, for example, NER, it is difficult to determine which contextualized embedding will produce the best representation for the tasks. Considering the lack of comparative studies to investigate the use of different contextualized pretrained models with sequence modeling classifiers, we conducted a comparative study about using different classifiers and embedding models. In this paper, we use different transformer‐based language models tuned with different classifiers, and we evaluate these models on the Persian NER task. We perform a comparative analysis to assess the impact of text representation and text classification methods on Persian NER performance. We train and evaluate the models on three different Persian NER datasets, that is, MoNa, Peyma, and Arman. Experimental results demonstrate that XLM‐R with a linear layer and conditional random field (CRF) layer exhibited the best performance. This model achieved phrase‐based F‐measures of 70.04, 86.37, and 79.25 and word‐based F scores of 78, 84.02, and 89.73 on the MoNa, Peyma, and Arman datasets, respectively. These results represent state‐of‐the‐art performance on the Persian NER task.
ISSN:	1225-6463 2233-7326
DOI:	10.4218/etrij.2021-0269