Text segmentation in degraded historical document images

Text segmentation from degraded Historical Indus script images helps Optical Character Recognizer (OCR) to achieve good recognition rates for Hindus scripts; however, it is challenging due to complex background in such images. In this paper, we present a new method for segmenting text and non-text i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Egyptian informatics journal 2016-07, Vol.17 (2), p.189-197
Hauptverfasser:	Kavitha, A.S., Shivakumara, P., Kumar, G.H., Lu, Tong
Format:	Artikel
Sprache:	eng
Schlagworte:	Clustering Indus document Sobel and Laplacian operations Text enhancement Text line segmentation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Text segmentation from degraded Historical Indus script images helps Optical Character Recognizer (OCR) to achieve good recognition rates for Hindus scripts; however, it is challenging due to complex background in such images. In this paper, we present a new method for segmenting text and non-text in Indus documents based on the fact that text components are less cursive compared to non-text ones. To achieve this, we propose a new combination of Sobel and Laplacian for enhancing degraded low contrast pixels. Then the proposed method generates skeletons for text components in enhanced images to reduce computational burdens, which in turn helps in studying component structures efficiently. We propose to study the cursiveness of components based on branch information to remove false text components. The proposed method introduces the nearest neighbor criterion for grouping components in the same line, which results in clusters. Furthermore, the proposed method classifies these clusters into text and non-text cluster based on characteristics of text components. We evaluate the proposed method on a large dataset containing varieties of images. The results are compared with the existing methods to show that the proposed method is effective in terms of recall and precision.
ISSN:	1110-8665 2090-4754
DOI:	10.1016/j.eij.2015.11.003