Arabic named entity recognition based on a sequence-2-sequence model with multi-head attention of transformer encoder

Recurrent neural network variants such as gated recurrent unit (GRU) and long short-term memory (LSTM) are widely used in sequence labeling models and have shown excellent performance in Natural Language Processing (NLP) tasks, including Named Entity Recognition (NER). When it comes to Arabic langua...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Alsultani, Hamid Sadeq Mahdi, Aliwy, Ahmed H.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Arabic language Coders Conditional random fields Datasets Labelling Natural language Natural language processing Recognition Recurrent neural networks Transformers Words (language)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recurrent neural network variants such as gated recurrent unit (GRU) and long short-term memory (LSTM) are widely used in sequence labeling models and have shown excellent performance in Natural Language Processing (NLP) tasks, including Named Entity Recognition (NER). When it comes to Arabic language processing, most existing NER models employ word embedding to capture similarities between words. In such methods, it becomes challenging to deal with unseen words during inference. On the other hand, few Arabic NER models use attention mechanism to enhance sequence labeling tasks in natural language understanding. To extend the NER state-ofthe-art in the Arabic language, we propose an efficient NER model that leverages the encoder block of the Transformer where both word-level embeddings and character-level embeddings are adopted to solve the problem of out-of-vocabulary. The combined word-level embeddings, and character-level embeddings is fed to an encoder with bidirectional-BiLSTM. The output of the encoder is given to a Multi-head Self-attention layer. Our Multi-head Self-attention implements the encoder block of transformer consisting of a self-attention followed by a feed-forward network. Then, the Conditional Random Fields (CRF) layer performs the classification at the last layer. Our proposed model was trained and evaluated on two public datasets, namely, ANERCorp and AQMAR and compared to recent state-of-the-art papers that used these datasets and F1-measure as performance metric. We obtain an F1-measure of 92.40% on the merged dataset.
ISSN:	0094-243X 1551-7616
DOI:	10.1063/5.0181993