Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition
Recently, scene text recognition (STR) models have shown significant performance improvements. However, existing models still encounter difficulties in recognizing challenging texts that involve factors such as severely distorted and perspective characters. These challenging texts mainly cause two p...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, scene text recognition (STR) models have shown significant
performance improvements. However, existing models still encounter difficulties
in recognizing challenging texts that involve factors such as severely
distorted and perspective characters. These challenging texts mainly cause two
problems: (1) Large Intra-Class Variance. (2) Small Inter-Class Variance. An
extremely distorted character may prominently differ visually from other
characters within the same category, while the variance between characters from
different classes is relatively small. To address the above issues, we propose
a novel method that enriches the character features to enhance the
discriminability of characters. Firstly, we propose the Character-Aware
Constraint Encoder (CACE) with multiple blocks stacked. CACE introduces a decay
matrix in each block to explicitly guide the attention region for each token.
By continuously employing the decay matrix, CACE enables tokens to perceive
morphological information at the character level. Secondly, an Intra-Inter
Consistency Loss (I^2CL) is introduced to consider intra-class compactness and
inter-class separability at feature space. I^2CL improves the discriminative
capability of features by learning a long-term memory unit for each character
category. Trained with synthetic data, our model achieves state-of-the-art
performance on common benchmarks (94.1% accuracy) and Union14M-Benchmark (61.6%
accuracy). Code is available at https://github.com/bang123-box/CFE. |
---|---|
DOI: | 10.48550/arxiv.2407.05562 |