Low Resource Arabic Dialects Transformer Neural Machine Translation Improvement through Incremental Transfer of Shared Linguistic Features

Neural machine translation (NMT) is a complex process that deals with many grammatical complexities. Today, transfer learning (TL) has emerged as a leading method in machine translation, enhancing accuracy with ample source data for limited target data. Yet, low-resource languages such as Arabic dia...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Arabian journal for science and engineering (2011) 2024, Vol.49 (9), p.12393-12409
Hauptverfasser: Slim, Amel, Melouah, Ahlem
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Neural machine translation (NMT) is a complex process that deals with many grammatical complexities. Today, transfer learning (TL) has emerged as a leading method in machine translation, enhancing accuracy with ample source data for limited target data. Yet, low-resource languages such as Arabic dialects lack substantial source data. This study aims to enable an NMT model, trained on a sparse Arabic dialect corpus, to translate a precise dialect with a limited corpus, addressing this gap. This paper introduces an incremental transfer learning approach tailored for translating low-resource language. The method utilizes various related language corpora, employing an incremental fine-tuning strategy to transfer linguistic features from a grand-parent model to a child model. In our case, Knowledge is transferred from a broad set of Arabic dialects to the Maghrebi dialects subset and then to specific low-resource dialects such as Algerian, Tunisian, and Moroccan, employing Transformer and attentional sequence-to-sequence models. The evaluation of the proposed strategy on Algerian, Tunisian, and Moroccan dialects demonstrates superior translation performance compared to traditional TL methods. Using the Transformer model, it shows improvements of 80%, 62%, and 58% for Algerian, Tunisian, and Moroccan dialects, respectively. Similarly, with the Attentional seq2seq model, there’s an enhancement of 98% in BLEU score results for Algerian, Tunisian, and Moroccan dialects.
ISSN:2193-567X
1319-8025
2191-4281
DOI:10.1007/s13369-023-08543-9