Binarization and Segmentation Framework for Sundanese Ancient Documents

Binarization and segmentation process are two first important methods for optical character recognition system. For ancient document image which is written by human, binarization process remains a major challenge.In general, it is occurring because the image quality is badly degraded image and has v...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Jurnal Sains Dasar 2017-11, Vol.6 (2), p.133-142
Hauptverfasser: Erick Paulus, Mira Suryani, Setiawan Hadi, Rahmat Sopian, Akik Hidayat
Format: Artikel
Sprache:ind
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Binarization and segmentation process are two first important methods for optical character recognition system. For ancient document image which is written by human, binarization process remains a major challenge.In general, it is occurring because the image quality is badly degraded image and has various different noises in the non-text area.After binarization process, segmentation based on line is conducted in separate text-line from the others. We proposedanovel frameworkof binarization and segmentation process that enhance the performance of Niblackbinarization method and implementthe minimum of energy function to find the path of the separator line between two text-line.For experiments, we use the 22 images that come from the Sundanese ancient documents on Kropak 18 and Kropak22. The evaluation matrix show that our proposed binarization succeeded to improve F-measure 20%for Kropak 22 and 50% for Kropak 18 from original Niblack method.Then, we present the influence of various input images both true color and binary image to text-line segmentation. In line segmentation process, binarized image from our proposed framework can producethe number of line-text as same as the number of target lines. Overall, our proposed framework produce promised results so it can be used as input images for the next OCR process.
ISSN:2085-9872
2443-1273
DOI:10.21831/j.sainddasar.v6i2.15314