ANALYSIS OF CANONICAL CHARACTER SEGMENTATION TECHNIQUE FOR ANCIENT TELUGU TEXT DOCUMENTS

Character Recognition in ancient document images remains a challenging task. Initial scanning process deforms the document image, while aging process of document render it ancient which turns it to posses unwanted background noise. Segmentation includes an essential process in OCR. Complex scripts l...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Theoretical and Applied Information Technology 2015-12, Vol.82 (2), p.311-311
Hauptverfasser: Rao, N Venkata, Sastry, A S C S, Chakravarthy, A S N, Rao, A V Srinivasa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Character Recognition in ancient document images remains a challenging task. Initial scanning process deforms the document image, while aging process of document render it ancient which turns it to posses unwanted background noise. Segmentation includes an essential process in OCR. Complex scripts like derivatives of Brahmi, encounter various problems in the segmentation process. A hybrid model that entails segmentation in noisy images followed by binarization is proposed. In the first phase, segmentation technique for the ancient Telugu document image into meaningful units is proposed. Horizontal profile pattern is convolved with Gaussian kernel. The statistical properties of meaningful units are explored through an extensive analysis of the geometrical patterns of meaningful units. In the second phase, noisy documents are cleaned with the help of Modified IGT algorithm and then segmented by using conventional profile mechanism. The performance of the present hybrid technique is proved by the results of higher efficiencies for the cleaned documents. The efficiency analysis of segmentation carried out for the present hybrid technique reveals a threshold number of Vowels (V), Consonants(C), CV core characters to exhibit higher efficiencies. It also reflects upon the non-canonical features of any other marks of the Telugu document.
ISSN:1817-3195