Hybrid Approach to Adaptive OCR for Historical Books

Optical character recognition (OCR) technology is widely used to convert scanned documents to text. However, historical books still remain a challenge for state-of-the-art OCR engines. This work proposes a new approach to the OCR of large bodies of text by creating an adaptive mechanism that adjusts...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Kluzner, V., Tzadok, A., Chevion, D., Walach, E.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Optical character recognition (OCR) technology is widely used to convert scanned documents to text. However, historical books still remain a challenge for state-of-the-art OCR engines. This work proposes a new approach to the OCR of large bodies of text by creating an adaptive mechanism that adjusts itself to each text being processed. This approach provides significant improvements to the OCR results achieved. Our approach uses a modified hierarchical optical flow with a second-order regularization term to compare each new character with the set of super-symbols (character templates) by using its distance maps. The classification process is based on a hybrid approach combining measures of geometrical differences (spatial domain) and distortion gradients (feature domain).
ISSN:1520-5363
2379-2140
DOI:10.1109/ICDAR.2011.183