Textline information extraction from grayscale camera-captured document images

Cameras offer flexible document imaging, but with uneven shading and non-planar page shape. Therefore camera captured documents need to go through dewarping before being processed by traditional text recognition methods. Curled textline detection is an important step of dewarping. Previous approache...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Bukhari, S.S., Breuel, T.M., Shafait, F.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Cameras offer flexible document imaging, but with uneven shading and non-planar page shape. Therefore camera captured documents need to go through dewarping before being processed by traditional text recognition methods. Curled textline detection is an important step of dewarping. Previous approaches of curled textline detection use binarization as a pre-processing step, which can negatively affect the detection results under uneven shading. Furthermore, these approaches are sensitive to high degrees of curl and estimate x-line1 and baseline pairs using regression which may result in inaccurate estimation. We introduce a novel curled textline detection approach for grayscale document images. First, the textline structure is enhanced by using match filter bank smoothing and then central lines of textlines are detected using ridges. Then, x-line and baseline pairs are estimated by adapting active contours (snakes) over ridges. Unlike other approaches, our approach does not use binarization and applies directly on grayscale images. We achieved 91% of detection accuracy with good estimation of x-line and baseline pairs on the dataset of CBDAR 2007 document image dewarping contest.
ISSN:1522-4880
2381-8549
DOI:10.1109/ICIP.2009.5413799