A Scalable Handwritten Text Recognition System
Many studies on (Offline) Handwritten Text Recognition (HTR) systems have focused on building state-of-the-art models for line recognition on small corpora. However, adding HTR capability to a large scale multilingual OCR system poses new challenges. This paper addresses three problems in building s...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Many studies on (Offline) Handwritten Text Recognition (HTR) systems have
focused on building state-of-the-art models for line recognition on small
corpora. However, adding HTR capability to a large scale multilingual OCR
system poses new challenges. This paper addresses three problems in building
such systems: data, efficiency, and integration. Firstly, one of the biggest
challenges is obtaining sufficient amounts of high quality training data. We
address the problem by using online handwriting data collected for a large
scale production online handwriting recognition system. We describe our image
data generation pipeline and study how online data can be used to build HTR
models. We show that the data improve the models significantly under the
condition where only a small number of real images is available, which is
usually the case for HTR models. It enables us to support a new script at
substantially lower cost. Secondly, we propose a line recognition model based
on neural networks without recurrent connections. The model achieves a
comparable accuracy with LSTM-based models while allowing for better
parallelism in training and inference. Finally, we present a simple way to
integrate HTR models into an OCR system. These constitute a solution to bring
HTR capability into a large scale OCR system. |
---|---|
DOI: | 10.48550/arxiv.1904.09150 |