Preliminary research for provision of Javanese script image dataset from Javanese script printed book

The initial process of developing a Javanese script transliteration system to other scripts using a character recognition approach requires training data in the form of script images of all possible forms. The source of the dataset are script images from a book written in Javanese and then processed...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Widiarti, Anastasia Rita, Prima, Gabriel Ryan, Adi, Ciprianus Kuntoro
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The initial process of developing a Javanese script transliteration system to other scripts using a character recognition approach requires training data in the form of script images of all possible forms. The source of the dataset are script images from a book written in Javanese and then processed using image processing approach. The captured images were then grouped into their respective classes. The study starts with pre-processing the document images that includes the sub-processes of binarization, inverse, filtering, and followed by script segmentation using the projection profile method. Each script image is then processed in the feature extraction steps using the Intensity of Character or IoC algorithm. The feature data of each script image is then grouped using the K-Means clustering algorithm. The data was taken from the scan results of Hamong Tani’s book on pages 2 and 59. After pre-processed and segmented images, 597 images of Javanese script were obtained. Using the IoC 3x3 feature, and the number of groups determined by 65 classes, the silhouette index value of the grouping results was found to be 0.5060. After calculating the ground truth value, it was found that the accuracy of the results was 86%. It can be concluded that the steps taken in this research can be used as a model in the process of providing a Javanese script images dataset.
ISSN:0094-243X
1551-7616
DOI:10.1063/5.0201159