Conv-ViT fusion for improved handwritten Arabic character classification

An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on pos...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal, image and video processing image and video processing, 2024, Vol.18 (Suppl 1), p.355-372
Hauptverfasser: Rouabhi, Sarra, Azerine, Abdennour, Tlemsani, Redouane, Essaid, Mokhtar, Idoumghar, Lhassane
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition.
ISSN:1863-1703
1863-1711
DOI:10.1007/s11760-024-03158-5