Arabic duplicate questions detection based on contextual representation, class label matching, and structured self attention

Question Answering Systems (QAS) are rising solutions providing exact and precise answers to natural questions. Duplicate Question Detection (DQD), which aims to reuse previous answers, has shown its ability to improve user experience and reduce significantly the response time. However, few Arabic Q...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of King Saud University. Computer and information sciences 2022-06, Vol.34 (6), p.3758-3765
Hauptverfasser: Hamza, Alami, Alaoui Ouatik, Said El, Zidani, Khalid Alaoui, En-Nahnahi, Noureddine
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Question Answering Systems (QAS) are rising solutions providing exact and precise answers to natural questions. Duplicate Question Detection (DQD), which aims to reuse previous answers, has shown its ability to improve user experience and reduce significantly the response time. However, few Arabic QAS integrate solutions able to detect duplicate questions in their workflow. In this paper, we build a DQD method based on contextual word representation, question classification and forward/backward structured self attention. First, we extract contextual word representation Embeddings from Language Models (ELMo) to map questions into a vector space. Next, we train two models to classify question embedding according to two taxonomies: Hamza et al. and Li & Roth. Then, we introduce a class label matching step to filter out questions that have different class labels. Finally, we propose a Bidirectional Attention Bidirectional LSTM (BiAttention BiLSTM) model that focuses only on keywords to predict whether a question pair is a duplicate or not. We also apply a data augmentation strategy based on symmetry, reflexivity, and transitivity relations to improve the generalization of our model. Various experimentations are performed to evaluate the impact of question classification and pre-processing step on DQD model. The obtained results show that our model achieves good performances as compared to the baseline results.
ISSN:1319-1578
2213-1248
DOI:10.1016/j.jksuci.2020.11.032