Scene Text Detection and Segmentation Based on Cascaded Convolution Neural Networks

Scene text detection and segmentation are two important and challenging research problems in the field of computer vision. This paper proposes a novel method for scene text detection and segmentation based on cascaded convolution neural networks (CNNs). In this method, a CNN-based text-aware candida...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing 2017-03, Vol.26 (3), p.1509-1520
Hauptverfasser:	Tang, Youbao, Wu, Xiangqian
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks candidate text region classification candidate text region refinement Convolution Feature extraction Image color analysis Image edge detection Image segmentation Neural networks Scene Text detection scene text segmentation text-aware candidate text region extraction Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Scene text detection and segmentation are two important and challenging research problems in the field of computer vision. This paper proposes a novel method for scene text detection and segmentation based on cascaded convolution neural networks (CNNs). In this method, a CNN-based text-aware candidate text region (CTR) extraction model (named detection network, DNet) is designed and trained using both the edges and the whole regions of text, with which coarse CTRs are detected. A CNN-based CTR refinement model (named segmentation network, SNet) is then constructed to precisely segment the coarse CTRs into text to get the refined CTRs. With DNet and SNet, much fewer CTRs are extracted than with traditional approaches while more true text regions are kept. The refined CTRs are finally classified using a CNN-based CTR classification model (named classification network, CNet) to get the final text regions. All of these CNN-based models are modified from VGGNet-16. Extensive experiments on three benchmark data sets demonstrate that the proposed method achieves the state-of-the-art performance and greatly outperforms other scene text detection and segmentation approaches.
ISSN:	1057-7149 1941-0042
DOI:	10.1109/TIP.2017.2656474