Precise detection of Chinese characters in historical documents with deep reinforcement learning

•We propose precise detection of Chinese characters using a deep reinforcement learning framework to obtain tighter bounding boxes under large IoU.•We propose a new PSRoI pooling method and a new dense reward function.•The proposed deep reinforcement learning framework combines the advantages of thr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2020-11, Vol.107, p.107503, Article 107503
Hauptverfasser: Sihang, Wu, Jiapeng, Wang, Weihong, Ma, Lianwen, Jin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We propose precise detection of Chinese characters using a deep reinforcement learning framework to obtain tighter bounding boxes under large IoU.•We propose a new PSRoI pooling method and a new dense reward function.•The proposed deep reinforcement learning framework combines the advantages of three variants of Deep Q-Networks.•Our method achieves state-of-the-art results on several Chinese historical document datasets. The decision-making ability of deep reinforcement learning has been proved successfully in a variety of fields. Here, we use this method for precise character detection by making tight bounding boxes around the Chinese characters in historical documents. An agent is trained to learn the control policy of fine-tuning a bounding box step-by-step through a Markov Decision Process. We introduce a novel fully convolutional network with position-sensitive Region-of-Interest (RoI) pooling (FCPN). The network receives character patches as input without fixed size, and it can fuse position information into the features of actions. Besides, we propose a dense reward function (DRF) that provides excellent rewards according to different actions and environment states, improving the decision-making ability of the agent. Our approach is designed as a universal method that can be applied to the output of all character-level or word-level text detectors to obtain more precise detection results. Application to the Tripitaka Koreana in Han (TKH) and Multiple Tripitaka in Han (MTH) datasets confirm the very promising performance of this method. In particular, our approach yields a significant improvement under a large Intersection over Union (IoU) of 0.8. The robustness and generality are also proved by experiments on the scene text datasets ICDAR2013 and ICDAR2015.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2020.107503