Dataset Generation for Gujarati Language Using Handwritten Character Images

In pattern recognition, the handwritten character recognition (HCR) is considered as the classical challenge. In particular, the benchmark dataset for HCR in the Gujarati language is limited. To overcome this challenge, a proper dataset is required for experimentation. Hence, this work introduces da...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Wireless personal communications 2024-06, Vol.136 (4), p.2163-2184
Hauptverfasser: Suthar, Sanket B., Thakkar, Amit R.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In pattern recognition, the handwritten character recognition (HCR) is considered as the classical challenge. In particular, the benchmark dataset for HCR in the Gujarati language is limited. To overcome this challenge, a proper dataset is required for experimentation. Hence, this work introduces dataset generation for the Gujarati language using pre-processing and classification techniques. Initially, the handwritten data is collected from various native Gujarati writers. In this work, there are three processes carried out to generate the dataset. Initially, the pre-processing stages like a selection of image, noise removal, normalization, conversion of integer value to double, grayscale image into a binary image, dimensionality reduction, and vector conversation are performed. Then, the pre-processed image is segmented using line segmentation, character segmentation and word segmentation. Finally, the data are classified using a Convolutional neural network (CNN). The kappa and FPR (False Positive Rate) values achieved by the CNN are 0.981 and 0.189.
ISSN:0929-6212
1572-834X
DOI:10.1007/s11277-024-11369-9