Conv-ViT fusion for improved handwritten Arabic character classification

An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on pos...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Signal, image and video processing image and video processing, 2024, Vol.18 (Suppl 1), p.355-372
Hauptverfasser:	Rouabhi, Sarra, Azerine, Abdennour, Tlemsani, Redouane, Essaid, Mokhtar, Idoumghar, Lhassane
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Imaging Computer Science Handwriting Handwriting recognition Image Processing and Computer Vision Machine learning Multimedia Information Systems Original Paper Pattern recognition Pattern Recognition and Graphics Signal,Image and Speech Processing Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition.
ISSN:	1863-1703 1863-1711
DOI:	10.1007/s11760-024-03158-5