DenseNet-CTC: An end-to-end RNN-free architecture for context-free string recognition

String recognition is one of the challenging tasks in document analysis and recognition areas. Recently, with the surge of interest in end-to-end segmentation-free methods, CRNN (Convolution Recurrent Neural Network), which is a combination of CNN (Convolutional Neural Network) and RNN-CTC (Recurren...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer vision and image understanding 2021-03, Vol.204, p.103168, Article 103168
Hauptverfasser: Zhan, Hongjian, Lyu, Shujing, Lu, Yue, Pal, Umapada
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:String recognition is one of the challenging tasks in document analysis and recognition areas. Recently, with the surge of interest in end-to-end segmentation-free methods, CRNN (Convolution Recurrent Neural Network), which is a combination of CNN (Convolutional Neural Network) and RNN-CTC (Recurrent Neural Network-Connectionist Temporal Classification), has been widely applied to string recognition. However, in some context-free cases, where a character is followed by arbitrary characters like the digit string, there may be no or very few context links in these strings. In this paper, we propose a new end-to-end RNN-free architecture especially for context-free string recognition and apply it to Handwritten Digit String Recognition (HDSR) task. The proposed architecture is based on CNN and CTC, but without the usage of RNN, and we apply column-wise fully connected layers to connect the convolutional layers and CTC directly. Moreover, to compensate for the possible reduction in modeling capabilities caused by the absence of RNN, we apply densely connected convolutional layers to extract efficient features. We test this new architecture on three public HDSR benchmarks (ORAND-CAR-A, ORAND-CAR-B and CVL HDS) and three other datasets that include a handwritten telephone/postcode dataset PhPAIS and two non-Arabic digit datasets (C-Bangla and C-Hindi). Furthermore, we generate three handwritten digit string datasets to further analyze the influence of RNN. The recognition results on all datasets demonstrate the superiority of the proposed model. •We propose a new RNN-free architecture for handwritten string recognition task.•We achieve the state-of-the-art performances on several HDSR benchmarks.•We show that how RNNs affect the recognition performance.•Our proposed method has faster speed and fewer parameters than RNN-based models.
ISSN:1077-3142
1090-235X
DOI:10.1016/j.cviu.2021.103168