A Convolutional Neural Network-Based Chinese Text Detection Algorithm via Text Structure Modeling

Text detection in a natural environment plays an important role in many computer vision applications. While existing text detection methods are focused on English characters, there are strong application demands on text detection in other languages, such as Chinese. In this paper, we present a novel...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2017-03, Vol.19 (3), p.506-518
Hauptverfasser:	Ren, Xiaohang, Zhou, Yi, He, Jianhua, Chen, Kai, Yang, Xiaokang, Sun, Jun
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Chinese text detection Computer vision convolutional neural network (CNN) Detection algorithms Detectors Feature extraction Image edge detection Machine learning Neural networks text structure detector Unsupervised learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Text detection in a natural environment plays an important role in many computer vision applications. While existing text detection methods are focused on English characters, there are strong application demands on text detection in other languages, such as Chinese. In this paper, we present a novel text detection algorithm for Chinese characters based on a specific designed convolutional neural network (CNN). The CNN contains a text structure component detector layer, a spatial pyramid layer, and a multi-input-layer deep belief network (DBN). The CNN is pre-trained via a convolutional sparse auto-encoder, specifically designed for extracting complex features from Chinese characters. In particular, the text structure component detectors enhance the accuracy and uniqueness of feature descriptors by extracting multiple text structure components in various ways. The spatial pyramid layer enhances the scale invariability of the CNN for detecting texts in multiple scales. Finally, the multi-input-layer DBN replaces the fully connected layers in the CNN to ensure features from multiple scales are comparable. A multilingual text detection dataset, in which texts in Chinese, English, and digits are labeled separately, is set up to evaluate the proposed text detection algorithm. The proposed algorithm shows a significant performance improvement over the baseline CNN algorithms. In addition the proposed algorithm is evaluated over a public multilingual benchmark and achieves state-of-the-art result under multiple languages. Furthermore, a simplified version of the proposed algorithm with only general components is evaluated on the ICDAR 2011 and 2013 datasets, showing comparable detection performance to the existing general text detection algorithms.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2016.2625259