Conv-ViT fusion for improved handwritten Arabic character classification

An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on pos...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal, image and video processing image and video processing, 2024, Vol.18 (Suppl 1), p.355-372
Hauptverfasser: Rouabhi, Sarra, Azerine, Abdennour, Tlemsani, Redouane, Essaid, Mokhtar, Idoumghar, Lhassane
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 372
container_issue Suppl 1
container_start_page 355
container_title Signal, image and video processing
container_volume 18
creator Rouabhi, Sarra
Azerine, Abdennour
Tlemsani, Redouane
Essaid, Mokhtar
Idoumghar, Lhassane
description An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition.
doi_str_mv 10.1007/s11760-024-03158-5
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3072276308</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3072276308</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-e290def8c8b7b6f10b28549e7723ba624ae5e152b63e6912702af72035cbd8473</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWGr_gKcFz9FJsvnosRS1hYKX6jUk2cSmtLs12Vb890ZX9OZcZg7vMzM8CF0TuCUA8i4TIgVgoDUGRrjC_AyNiBIME0nI-e8M7BJNct5CKUalEmqEFvOuPeGXuK7CMceurUKXqrg_pO7km2pj2uY9xb73bTVLxkZXuY1JxvU-VW5nco4hOtMX8ApdBLPLfvLTx-j54X49X-DV0-NyPlthRyX02NMpND4op6y0IhCwVPF66qWkzBpBa-O5J5xawbyYksJQEyQFxp1tVC3ZGN0Me8uLb0efe73tjqktJzUDSakUDFRJ0SHlUpdz8kEfUtyb9KEJ6C9pepCmizT9LU3zArEByiXcvvr0t_of6hPW3m5w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3072276308</pqid></control><display><type>article</type><title>Conv-ViT fusion for improved handwritten Arabic character classification</title><source>SpringerLink Journals - AutoHoldings</source><creator>Rouabhi, Sarra ; Azerine, Abdennour ; Tlemsani, Redouane ; Essaid, Mokhtar ; Idoumghar, Lhassane</creator><creatorcontrib>Rouabhi, Sarra ; Azerine, Abdennour ; Tlemsani, Redouane ; Essaid, Mokhtar ; Idoumghar, Lhassane</creatorcontrib><description>An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-024-03158-5</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Computer Imaging ; Computer Science ; Handwriting ; Handwriting recognition ; Image Processing and Computer Vision ; Machine learning ; Multimedia Information Systems ; Original Paper ; Pattern recognition ; Pattern Recognition and Graphics ; Signal,Image and Speech Processing ; Vision</subject><ispartof>Signal, image and video processing, 2024, Vol.18 (Suppl 1), p.355-372</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-e290def8c8b7b6f10b28549e7723ba624ae5e152b63e6912702af72035cbd8473</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11760-024-03158-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11760-024-03158-5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,777,781,27906,27907,41470,42539,51301</link.rule.ids></links><search><creatorcontrib>Rouabhi, Sarra</creatorcontrib><creatorcontrib>Azerine, Abdennour</creatorcontrib><creatorcontrib>Tlemsani, Redouane</creatorcontrib><creatorcontrib>Essaid, Mokhtar</creatorcontrib><creatorcontrib>Idoumghar, Lhassane</creatorcontrib><title>Conv-ViT fusion for improved handwritten Arabic character classification</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition.</description><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Handwriting</subject><subject>Handwriting recognition</subject><subject>Image Processing and Computer Vision</subject><subject>Machine learning</subject><subject>Multimedia Information Systems</subject><subject>Original Paper</subject><subject>Pattern recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Signal,Image and Speech Processing</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWGr_gKcFz9FJsvnosRS1hYKX6jUk2cSmtLs12Vb890ZX9OZcZg7vMzM8CF0TuCUA8i4TIgVgoDUGRrjC_AyNiBIME0nI-e8M7BJNct5CKUalEmqEFvOuPeGXuK7CMceurUKXqrg_pO7km2pj2uY9xb73bTVLxkZXuY1JxvU-VW5nco4hOtMX8ApdBLPLfvLTx-j54X49X-DV0-NyPlthRyX02NMpND4op6y0IhCwVPF66qWkzBpBa-O5J5xawbyYksJQEyQFxp1tVC3ZGN0Me8uLb0efe73tjqktJzUDSakUDFRJ0SHlUpdz8kEfUtyb9KEJ6C9pepCmizT9LU3zArEByiXcvvr0t_of6hPW3m5w</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Rouabhi, Sarra</creator><creator>Azerine, Abdennour</creator><creator>Tlemsani, Redouane</creator><creator>Essaid, Mokhtar</creator><creator>Idoumghar, Lhassane</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>2024</creationdate><title>Conv-ViT fusion for improved handwritten Arabic character classification</title><author>Rouabhi, Sarra ; Azerine, Abdennour ; Tlemsani, Redouane ; Essaid, Mokhtar ; Idoumghar, Lhassane</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-e290def8c8b7b6f10b28549e7723ba624ae5e152b63e6912702af72035cbd8473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Handwriting</topic><topic>Handwriting recognition</topic><topic>Image Processing and Computer Vision</topic><topic>Machine learning</topic><topic>Multimedia Information Systems</topic><topic>Original Paper</topic><topic>Pattern recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Signal,Image and Speech Processing</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rouabhi, Sarra</creatorcontrib><creatorcontrib>Azerine, Abdennour</creatorcontrib><creatorcontrib>Tlemsani, Redouane</creatorcontrib><creatorcontrib>Essaid, Mokhtar</creatorcontrib><creatorcontrib>Idoumghar, Lhassane</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rouabhi, Sarra</au><au>Azerine, Abdennour</au><au>Tlemsani, Redouane</au><au>Essaid, Mokhtar</au><au>Idoumghar, Lhassane</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Conv-ViT fusion for improved handwritten Arabic character classification</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2024</date><risdate>2024</risdate><volume>18</volume><issue>Suppl 1</issue><spage>355</spage><epage>372</epage><pages>355-372</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-024-03158-5</doi><tpages>18</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1863-1703
ispartof Signal, image and video processing, 2024, Vol.18 (Suppl 1), p.355-372
issn 1863-1703
1863-1711
language eng
recordid cdi_proquest_journals_3072276308
source SpringerLink Journals - AutoHoldings
subjects Computer Imaging
Computer Science
Handwriting
Handwriting recognition
Image Processing and Computer Vision
Machine learning
Multimedia Information Systems
Original Paper
Pattern recognition
Pattern Recognition and Graphics
Signal,Image and Speech Processing
Vision
title Conv-ViT fusion for improved handwritten Arabic character classification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T11%3A30%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Conv-ViT%20fusion%20for%20improved%20handwritten%20Arabic%20character%20classification&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Rouabhi,%20Sarra&rft.date=2024&rft.volume=18&rft.issue=Suppl%201&rft.spage=355&rft.epage=372&rft.pages=355-372&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-024-03158-5&rft_dat=%3Cproquest_cross%3E3072276308%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3072276308&rft_id=info:pmid/&rfr_iscdi=true