Towards complex scene text reading with selective region proposal and two-stage deep reinforcement learning

The challenging task of accurately detecting and recognizing text from images captured in intricate real-world settings is known as text reading. The inherent complexity and variability of real-world scenes often make conventional optical character recognition systems ineffective. Furthermore, altho...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied soft computing 2025-02, Vol.170, p.112701, Article 112701
Hauptverfasser: Harizi, Riadh, Walha, Rim, Drira, Fadoua
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The challenging task of accurately detecting and recognizing text from images captured in intricate real-world settings is known as text reading. The inherent complexity and variability of real-world scenes often make conventional optical character recognition systems ineffective. Furthermore, although deep learning-based systems perform well with horizontal text in natural scenes, they frequently encounter difficulties with oriented text. To address these limitations, we propose a multi-oriented scene text reading framework utilizing a selective region proposal technique based on Scale-Invariant Feature Transform keypoints. This approach focuses on relevant text regions, enhancing efficiency and precision. We further refine text localization through bounding box regression. A two-stage deep reinforcement learning framework, incorporating character/word awareness, is employed to correct and align detected characters, and to validate annotated words. This framework utilizes a generative adversarial network for enhanced character extraction and recognition. Our extensive evaluations on five benchmark datasets demonstrate the effectiveness of our approach in handling real-world scene text reading challenges, achieving promising results compared to state-of-the-art methods. •An efficient two-stage deep reinforcement learning based framework is proposed.•This framework is a deep learning-based solution suitable for text reading in the wild.•This framework relies on a character/word awareness based image analysis.•Efficient scene text localization with an hybrid SIFT-ResNet based selective search.•Interesting results are achieved on five public datasets with complex scenarios.
ISSN:1568-4946
DOI:10.1016/j.asoc.2025.112701