Online continuous multi-stroke Persian/Arabic character recognition by novel spatio-temporal features for digitizer pen devices

Nowadays, digitizer pens have become front end of many digital devices. The increasing use of this technology has necessitated the need for producing pen-based virtual keyboard systems. Despite attempts to create such systems in English, their absence for Persian/Arabic languages is an obvious defec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural computing & applications 2020-04, Vol.32 (8), p.3853-3872
Hauptverfasser: Valikhani, Sara, Abdali-Mohammadi, Fardin, Fathi, Abdolhossein
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Nowadays, digitizer pens have become front end of many digital devices. The increasing use of this technology has necessitated the need for producing pen-based virtual keyboard systems. Despite attempts to create such systems in English, their absence for Persian/Arabic languages is an obvious defect. The goal of this paper is presenting an online continuous Persian/Arabic character recognition method. A character in Persian/Arabic language is made of two types of signs or strokes: main body and delayed strokes (which may be zero or more sign). In this paper, a set of novel and discriminative spatial features are defined for these strokes. These features then are used in a novel algorithm to create a genetic programming-based decision tree called GPDT. The GPDT and spatio-temporal features are utilized by non-deterministic finite automata (NDFA) to recognize group-related strokes and related characters. The reason for using spatio-temporal features is the sameness of the main body of some Persian/Arabic letters (e.g., “ح، خ، ج، چ”). There are also two other issues related to recognizing Persian/Arabic letters: unknown number of delayed stroke segments and the sameness of delayed strokes placement, which are removed by using an NDFA. In fact, after identifying group of main body with the help of GPDT, each recognized stroke makes a move in NDFA to stop in a character state (final state on the end of a path in NDFA). The proposed algorithm recognizes continuous Persian/Arabic letters and digits with a 92.43% accuracy and isolated letters and digits with 97.52% accuracy.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-019-04225-6