Contextualized dynamic meta embeddings based on Gated CNNs and self-attention for Arabic machine translation

PurposeThe paper aims to enhance Arabic machine translation (MT) by proposing novel approaches: (1) a dimensionality reduction technique for word embeddings tailored for Arabic text, optimizing efficiency while retaining semantic information; (2) a comprehensive comparison of meta-embedding techniqu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of intelligent computing and cybernetics 2024-07, Vol.17 (3), p.605-631
Hauptverfasser: Bensalah, Nouhaila, Ayad, Habib, Adib, Abdellah, Ibn El Farouk, Abdelhamid
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:PurposeThe paper aims to enhance Arabic machine translation (MT) by proposing novel approaches: (1) a dimensionality reduction technique for word embeddings tailored for Arabic text, optimizing efficiency while retaining semantic information; (2) a comprehensive comparison of meta-embedding techniques to improve translation quality; and (3) a method leveraging self-attention and Gated CNNs to capture token dependencies, including temporal and hierarchical features within sentences, and interactions between different embedding types. These approaches collectively aim to enhance translation quality by combining different embedding schemes and leveraging advanced modeling techniques.Design/methodology/approachRecent works on MT in general and Arabic MT in particular often pick one type of word embedding model. In this paper, we present a novel approach to enhance Arabic MT by addressing three key aspects. Firstly, we propose a new dimensionality reduction technique for word embeddings, specifically tailored for Arabic text. This technique optimizes the efficiency of embeddings while retaining their semantic information. Secondly, we conduct an extensive comparison of different meta-embedding techniques, exploring the combination of static and contextual embeddings. Through this analysis, we identify the most effective approach to improve translation quality. Lastly, we introduce a novel method that leverages self-attention and Gated convolutional neural networks (CNNs) to capture token dependencies, including temporal and hierarchical features within sentences, as well as interactions between different types of embeddings. Our experimental results demonstrate the effectiveness of our proposed approach in significantly enhancing Arabic MT performance. It outperforms baseline models with a BLEU score increase of 2 points and achieves superior results compared to state-of-the-art approaches, with an average improvement of 4.6 points across all evaluation metrics.FindingsThe proposed approaches significantly enhance Arabic MT performance. The dimensionality reduction technique improves the efficiency of word embeddings while preserving semantic information. Comprehensive comparison identifies effective meta-embedding techniques, with the contextualized dynamic meta-embeddings (CDME) model showcasing competitive results. Integration of Gated CNNs with the transformer model surpasses baseline performance, leveraging both architectures' strengths. Overall, these fin
ISSN:1756-378X
1756-3798
DOI:10.1108/IJICC-03-2024-0106