Conv-ViT fusion for improved handwritten Arabic character classification
An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on pos...
Gespeichert in:
Veröffentlicht in: | Signal, image and video processing image and video processing, 2024, Vol.18 (Suppl 1), p.355-372 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 372 |
---|---|
container_issue | Suppl 1 |
container_start_page | 355 |
container_title | Signal, image and video processing |
container_volume | 18 |
creator | Rouabhi, Sarra Azerine, Abdennour Tlemsani, Redouane Essaid, Mokhtar Idoumghar, Lhassane |
description | An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition. |
doi_str_mv | 10.1007/s11760-024-03158-5 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3072276308</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3072276308</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-e290def8c8b7b6f10b28549e7723ba624ae5e152b63e6912702af72035cbd8473</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWGr_gKcFz9FJsvnosRS1hYKX6jUk2cSmtLs12Vb890ZX9OZcZg7vMzM8CF0TuCUA8i4TIgVgoDUGRrjC_AyNiBIME0nI-e8M7BJNct5CKUalEmqEFvOuPeGXuK7CMceurUKXqrg_pO7km2pj2uY9xb73bTVLxkZXuY1JxvU-VW5nco4hOtMX8ApdBLPLfvLTx-j54X49X-DV0-NyPlthRyX02NMpND4op6y0IhCwVPF66qWkzBpBa-O5J5xawbyYksJQEyQFxp1tVC3ZGN0Me8uLb0efe73tjqktJzUDSakUDFRJ0SHlUpdz8kEfUtyb9KEJ6C9pepCmizT9LU3zArEByiXcvvr0t_of6hPW3m5w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3072276308</pqid></control><display><type>article</type><title>Conv-ViT fusion for improved handwritten Arabic character classification</title><source>SpringerLink Journals - AutoHoldings</source><creator>Rouabhi, Sarra ; Azerine, Abdennour ; Tlemsani, Redouane ; Essaid, Mokhtar ; Idoumghar, Lhassane</creator><creatorcontrib>Rouabhi, Sarra ; Azerine, Abdennour ; Tlemsani, Redouane ; Essaid, Mokhtar ; Idoumghar, Lhassane</creatorcontrib><description>An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-024-03158-5</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Computer Imaging ; Computer Science ; Handwriting ; Handwriting recognition ; Image Processing and Computer Vision ; Machine learning ; Multimedia Information Systems ; Original Paper ; Pattern recognition ; Pattern Recognition and Graphics ; Signal,Image and Speech Processing ; Vision</subject><ispartof>Signal, image and video processing, 2024, Vol.18 (Suppl 1), p.355-372</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-e290def8c8b7b6f10b28549e7723ba624ae5e152b63e6912702af72035cbd8473</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11760-024-03158-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11760-024-03158-5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,777,781,27906,27907,41470,42539,51301</link.rule.ids></links><search><creatorcontrib>Rouabhi, Sarra</creatorcontrib><creatorcontrib>Azerine, Abdennour</creatorcontrib><creatorcontrib>Tlemsani, Redouane</creatorcontrib><creatorcontrib>Essaid, Mokhtar</creatorcontrib><creatorcontrib>Idoumghar, Lhassane</creatorcontrib><title>Conv-ViT fusion for improved handwritten Arabic character classification</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition.</description><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Handwriting</subject><subject>Handwriting recognition</subject><subject>Image Processing and Computer Vision</subject><subject>Machine learning</subject><subject>Multimedia Information Systems</subject><subject>Original Paper</subject><subject>Pattern recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Signal,Image and Speech Processing</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWGr_gKcFz9FJsvnosRS1hYKX6jUk2cSmtLs12Vb890ZX9OZcZg7vMzM8CF0TuCUA8i4TIgVgoDUGRrjC_AyNiBIME0nI-e8M7BJNct5CKUalEmqEFvOuPeGXuK7CMceurUKXqrg_pO7km2pj2uY9xb73bTVLxkZXuY1JxvU-VW5nco4hOtMX8ApdBLPLfvLTx-j54X49X-DV0-NyPlthRyX02NMpND4op6y0IhCwVPF66qWkzBpBa-O5J5xawbyYksJQEyQFxp1tVC3ZGN0Me8uLb0efe73tjqktJzUDSakUDFRJ0SHlUpdz8kEfUtyb9KEJ6C9pepCmizT9LU3zArEByiXcvvr0t_of6hPW3m5w</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Rouabhi, Sarra</creator><creator>Azerine, Abdennour</creator><creator>Tlemsani, Redouane</creator><creator>Essaid, Mokhtar</creator><creator>Idoumghar, Lhassane</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>2024</creationdate><title>Conv-ViT fusion for improved handwritten Arabic character classification</title><author>Rouabhi, Sarra ; Azerine, Abdennour ; Tlemsani, Redouane ; Essaid, Mokhtar ; Idoumghar, Lhassane</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-e290def8c8b7b6f10b28549e7723ba624ae5e152b63e6912702af72035cbd8473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Handwriting</topic><topic>Handwriting recognition</topic><topic>Image Processing and Computer Vision</topic><topic>Machine learning</topic><topic>Multimedia Information Systems</topic><topic>Original Paper</topic><topic>Pattern recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Signal,Image and Speech Processing</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rouabhi, Sarra</creatorcontrib><creatorcontrib>Azerine, Abdennour</creatorcontrib><creatorcontrib>Tlemsani, Redouane</creatorcontrib><creatorcontrib>Essaid, Mokhtar</creatorcontrib><creatorcontrib>Idoumghar, Lhassane</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rouabhi, Sarra</au><au>Azerine, Abdennour</au><au>Tlemsani, Redouane</au><au>Essaid, Mokhtar</au><au>Idoumghar, Lhassane</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Conv-ViT fusion for improved handwritten Arabic character classification</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2024</date><risdate>2024</risdate><volume>18</volume><issue>Suppl 1</issue><spage>355</spage><epage>372</epage><pages>355-372</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-024-03158-5</doi><tpages>18</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1863-1703 |
ispartof | Signal, image and video processing, 2024, Vol.18 (Suppl 1), p.355-372 |
issn | 1863-1703 1863-1711 |
language | eng |
recordid | cdi_proquest_journals_3072276308 |
source | SpringerLink Journals - AutoHoldings |
subjects | Computer Imaging Computer Science Handwriting Handwriting recognition Image Processing and Computer Vision Machine learning Multimedia Information Systems Original Paper Pattern recognition Pattern Recognition and Graphics Signal,Image and Speech Processing Vision |
title | Conv-ViT fusion for improved handwritten Arabic character classification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T11%3A30%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Conv-ViT%20fusion%20for%20improved%20handwritten%20Arabic%20character%20classification&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Rouabhi,%20Sarra&rft.date=2024&rft.volume=18&rft.issue=Suppl%201&rft.spage=355&rft.epage=372&rft.pages=355-372&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-024-03158-5&rft_dat=%3Cproquest_cross%3E3072276308%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3072276308&rft_id=info:pmid/&rfr_iscdi=true |