Towards complex scene text reading with selective region proposal and two-stage deep reinforcement learning
The challenging task of accurately detecting and recognizing text from images captured in intricate real-world settings is known as text reading. The inherent complexity and variability of real-world scenes often make conventional optical character recognition systems ineffective. Furthermore, altho...
Gespeichert in:
Veröffentlicht in: | Applied soft computing 2025-02, Vol.170, p.112701, Article 112701 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The challenging task of accurately detecting and recognizing text from images captured in intricate real-world settings is known as text reading. The inherent complexity and variability of real-world scenes often make conventional optical character recognition systems ineffective. Furthermore, although deep learning-based systems perform well with horizontal text in natural scenes, they frequently encounter difficulties with oriented text. To address these limitations, we propose a multi-oriented scene text reading framework utilizing a selective region proposal technique based on Scale-Invariant Feature Transform keypoints. This approach focuses on relevant text regions, enhancing efficiency and precision. We further refine text localization through bounding box regression. A two-stage deep reinforcement learning framework, incorporating character/word awareness, is employed to correct and align detected characters, and to validate annotated words. This framework utilizes a generative adversarial network for enhanced character extraction and recognition. Our extensive evaluations on five benchmark datasets demonstrate the effectiveness of our approach in handling real-world scene text reading challenges, achieving promising results compared to state-of-the-art methods.
•An efficient two-stage deep reinforcement learning based framework is proposed.•This framework is a deep learning-based solution suitable for text reading in the wild.•This framework relies on a character/word awareness based image analysis.•Efficient scene text localization with an hybrid SIFT-ResNet based selective search.•Interesting results are achieved on five public datasets with complex scenarios. |
---|---|
ISSN: | 1568-4946 |
DOI: | 10.1016/j.asoc.2025.112701 |