SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
End-to-end scene text spotting, which aims to read the text in natural images, has garnered significant attention in recent years. However, recent state-of-the-art methods usually incorporate detection and recognition simply by sharing the backbone, which does not directly take advantage of the feat...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | End-to-end scene text spotting, which aims to read the text in natural
images, has garnered significant attention in recent years. However, recent
state-of-the-art methods usually incorporate detection and recognition simply
by sharing the backbone, which does not directly take advantage of the feature
interaction between the two tasks. In this paper, we propose a new end-to-end
scene text spotting framework termed SwinTextSpotter v2, which seeks to find a
better synergy between text detection and recognition. Specifically, we enhance
the relationship between two tasks using novel Recognition Conversion and
Recognition Alignment modules. Recognition Conversion explicitly guides text
localization through recognition loss, while Recognition Alignment dynamically
extracts text features for recognition through the detection predictions. This
simple yet effective design results in a concise framework that requires
neither an additional rectification module nor character-level annotations for
the arbitrarily-shaped text. Furthermore, the parameters of the detector are
greatly reduced without performance degradation by introducing a Box Selection
Schedule. Qualitative and quantitative experiments demonstrate that
SwinTextSpotter v2 achieved state-of-the-art performance on various
multilingual (English, Chinese, and Vietnamese) benchmarks. The code will be
available at
\href{https://github.com/mxin262/SwinTextSpotterv2}{SwinTextSpotter v2}. |
---|---|
DOI: | 10.48550/arxiv.2401.07641 |