Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition

Recently, scene text recognition (STR) models have shown significant performance improvements. However, existing models still encounter difficulties in recognizing challenging texts that involve factors such as severely distorted and perspective characters. These challenging texts mainly cause two p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhou, Bangbang, Qu, Yadong, Wang, Zixiao, Li, Zicheng, Zhang, Boqiang, Xie, Hongtao
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhou, Bangbang Qu, Yadong Wang, Zixiao Li, Zicheng Zhang, Boqiang Xie, Hongtao
description	Recently, scene text recognition (STR) models have shown significant performance improvements. However, existing models still encounter difficulties in recognizing challenging texts that involve factors such as severely distorted and perspective characters. These challenging texts mainly cause two problems: (1) Large Intra-Class Variance. (2) Small Inter-Class Variance. An extremely distorted character may prominently differ visually from other characters within the same category, while the variance between characters from different classes is relatively small. To address the above issues, we propose a novel method that enriches the character features to enhance the discriminability of characters. Firstly, we propose the Character-Aware Constraint Encoder (CACE) with multiple blocks stacked. CACE introduces a decay matrix in each block to explicitly guide the attention region for each token. By continuously employing the decay matrix, CACE enables tokens to perceive morphological information at the character level. Secondly, an Intra-Inter Consistency Loss (I^2CL) is introduced to consider intra-class compactness and inter-class separability at feature space. I^2CL improves the discriminative capability of features by learning a long-term memory unit for each character category. Trained with synthetic data, our model achieves state-of-the-art performance on common benchmarks (94.1% accuracy) and Union14M-Benchmark (61.6% accuracy). Code is available at https://github.com/bang123-box/CFE.
doi_str_mv	10.48550/arxiv.2407.05562
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2407_05562</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2407_05562</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2407_055623</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw1zMwNTUz4mSIcstPLi1WyM9TKMlIVQjPyM9JVXDOSCxKTC5JLbJScMksTi7KzM3MSyzJLEOSUfDNT0nNycxLV0jLL1IITk7NS1UISa0oUQhKTc5Pz8ssyczP42FgTUvMKU7lhdLcDPJuriHOHrpgV8QXAM1NLKqMB7kmHuwaY8IqAHv6QAE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition</title><source>arXiv.org</source><creator>Zhou, Bangbang ; Qu, Yadong ; Wang, Zixiao ; Li, Zicheng ; Zhang, Boqiang ; Xie, Hongtao</creator><creatorcontrib>Zhou, Bangbang ; Qu, Yadong ; Wang, Zixiao ; Li, Zicheng ; Zhang, Boqiang ; Xie, Hongtao</creatorcontrib><description>Recently, scene text recognition (STR) models have shown significant performance improvements. However, existing models still encounter difficulties in recognizing challenging texts that involve factors such as severely distorted and perspective characters. These challenging texts mainly cause two problems: (1) Large Intra-Class Variance. (2) Small Inter-Class Variance. An extremely distorted character may prominently differ visually from other characters within the same category, while the variance between characters from different classes is relatively small. To address the above issues, we propose a novel method that enriches the character features to enhance the discriminability of characters. Firstly, we propose the Character-Aware Constraint Encoder (CACE) with multiple blocks stacked. CACE introduces a decay matrix in each block to explicitly guide the attention region for each token. By continuously employing the decay matrix, CACE enables tokens to perceive morphological information at the character level. Secondly, an Intra-Inter Consistency Loss (I^2CL) is introduced to consider intra-class compactness and inter-class separability at feature space. I^2CL improves the discriminative capability of features by learning a long-term memory unit for each character category. Trained with synthetic data, our model achieves state-of-the-art performance on common benchmarks (94.1% accuracy) and Union14M-Benchmark (61.6% accuracy). Code is available at https://github.com/bang123-box/CFE.</description><identifier>DOI: 10.48550/arxiv.2407.05562</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2407.05562$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.05562$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhou, Bangbang</creatorcontrib><creatorcontrib>Qu, Yadong</creatorcontrib><creatorcontrib>Wang, Zixiao</creatorcontrib><creatorcontrib>Li, Zicheng</creatorcontrib><creatorcontrib>Zhang, Boqiang</creatorcontrib><creatorcontrib>Xie, Hongtao</creatorcontrib><title>Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition</title><description>Recently, scene text recognition (STR) models have shown significant performance improvements. However, existing models still encounter difficulties in recognizing challenging texts that involve factors such as severely distorted and perspective characters. These challenging texts mainly cause two problems: (1) Large Intra-Class Variance. (2) Small Inter-Class Variance. An extremely distorted character may prominently differ visually from other characters within the same category, while the variance between characters from different classes is relatively small. To address the above issues, we propose a novel method that enriches the character features to enhance the discriminability of characters. Firstly, we propose the Character-Aware Constraint Encoder (CACE) with multiple blocks stacked. CACE introduces a decay matrix in each block to explicitly guide the attention region for each token. By continuously employing the decay matrix, CACE enables tokens to perceive morphological information at the character level. Secondly, an Intra-Inter Consistency Loss (I^2CL) is introduced to consider intra-class compactness and inter-class separability at feature space. I^2CL improves the discriminative capability of features by learning a long-term memory unit for each character category. Trained with synthetic data, our model achieves state-of-the-art performance on common benchmarks (94.1% accuracy) and Union14M-Benchmark (61.6% accuracy). Code is available at https://github.com/bang123-box/CFE.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw1zMwNTUz4mSIcstPLi1WyM9TKMlIVQjPyM9JVXDOSCxKTC5JLbJScMksTi7KzM3MSyzJLEOSUfDNT0nNycxLV0jLL1IITk7NS1UISa0oUQhKTc5Pz8ssyczP42FgTUvMKU7lhdLcDPJuriHOHrpgV8QXAM1NLKqMB7kmHuwaY8IqAHv6QAE</recordid><startdate>20240707</startdate><enddate>20240707</enddate><creator>Zhou, Bangbang</creator><creator>Qu, Yadong</creator><creator>Wang, Zixiao</creator><creator>Li, Zicheng</creator><creator>Zhang, Boqiang</creator><creator>Xie, Hongtao</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240707</creationdate><title>Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition</title><author>Zhou, Bangbang ; Qu, Yadong ; Wang, Zixiao ; Li, Zicheng ; Zhang, Boqiang ; Xie, Hongtao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2407_055623</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Bangbang</creatorcontrib><creatorcontrib>Qu, Yadong</creatorcontrib><creatorcontrib>Wang, Zixiao</creatorcontrib><creatorcontrib>Li, Zicheng</creatorcontrib><creatorcontrib>Zhang, Boqiang</creatorcontrib><creatorcontrib>Xie, Hongtao</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhou, Bangbang</au><au>Qu, Yadong</au><au>Wang, Zixiao</au><au>Li, Zicheng</au><au>Zhang, Boqiang</au><au>Xie, Hongtao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition</atitle><date>2024-07-07</date><risdate>2024</risdate><abstract>Recently, scene text recognition (STR) models have shown significant performance improvements. However, existing models still encounter difficulties in recognizing challenging texts that involve factors such as severely distorted and perspective characters. These challenging texts mainly cause two problems: (1) Large Intra-Class Variance. (2) Small Inter-Class Variance. An extremely distorted character may prominently differ visually from other characters within the same category, while the variance between characters from different classes is relatively small. To address the above issues, we propose a novel method that enriches the character features to enhance the discriminability of characters. Firstly, we propose the Character-Aware Constraint Encoder (CACE) with multiple blocks stacked. CACE introduces a decay matrix in each block to explicitly guide the attention region for each token. By continuously employing the decay matrix, CACE enables tokens to perceive morphological information at the character level. Secondly, an Intra-Inter Consistency Loss (I^2CL) is introduced to consider intra-class compactness and inter-class separability at feature space. I^2CL improves the discriminative capability of features by learning a long-term memory unit for each character category. Trained with synthetic data, our model achieves state-of-the-art performance on common benchmarks (94.1% accuracy) and Union14M-Benchmark (61.6% accuracy). Code is available at https://github.com/bang123-box/CFE.</abstract><doi>10.48550/arxiv.2407.05562</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2407.05562
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2407_05562
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T14%3A16%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Focus%20on%20the%20Whole%20Character:%20Discriminative%20Character%20Modeling%20for%20Scene%20Text%20Recognition&rft.au=Zhou,%20Bangbang&rft.date=2024-07-07&rft_id=info:doi/10.48550/arxiv.2407.05562&rft_dat=%3Carxiv_GOX%3E2407_05562%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true