An Encoder-Decoder Neural Network for Indefinite Length Digit Sequences in Natural Scene Recognition

Extracting text information from raw images has always been a hot and difficult problem in computer vision research. Due to blurred image, uneven illumination and complex backgrounds, etc., the recognition of natural scene digit is difficult to achieve desired results. In this paper, an encoder-deco...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of physics. Conference series 2019-11, Vol.1345 (2), p.22025
Hauptverfasser: Zhang, Tong, Zhang, Erhan, Zhang, Hanfeng, Shen, Feihong, Guo, Dongwei, Lin, Kaichao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Extracting text information from raw images has always been a hot and difficult problem in computer vision research. Due to blurred image, uneven illumination and complex backgrounds, etc., the recognition of natural scene digit is difficult to achieve desired results. In this paper, an encoder-decoder neural network model is proposed to solve the problem of recognition of indefinite length digit sequences in natural scene. The encoder is convolutional neural network (CNN), and the decoder is long short-term memory (LSTM). The encoder accepts a fixed-format image and outputs a fixed-length feature vector; the decoder accepts the feature vector and outputs a predictive sequence of indefinite length. The verification based on Google Street View house number Dataset (SVHN) shows that our method has a good performance. After 10 hours of training, whose accuracy is 96.57%.
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/1345/2/022025