End-to-End Historical Handwritten Ethiopic Text Recognition Using Deep Learning

Recognizing handwritten text is a challenging task, especially for scripts with numerous alphabets and symbols. The Ethiopic script has a vast character set and is used for historical documents in typewritten, handwritten, and hand-printed forms. However, despite its importance as an ancient script,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023, Vol.11, p.99535-99545
Hauptverfasser: Malhotra, Ruchika, Addis, Maru Tesfaye
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recognizing handwritten text is a challenging task, especially for scripts with numerous alphabets and symbols. The Ethiopic script has a vast character set and is used for historical documents in typewritten, handwritten, and hand-printed forms. However, despite its importance as an ancient script, optical character recognition research has not given enough attention to Ethiopic text recognition. In recent years, deep learning (DL) has emerged as a powerful technique for recognizing patterns. In this study, a DL approach is used to recognize historical Ethiopic handwritten texts. The recognition model employs an end-to-end strategy enabling sequential feature extraction and efficient recognition. An attention mechanism coupled with a connectionist temporal classification architecture is the core of this recognition model architecture. In addition, there are seven convolutional neural networks and two recurrent neural networks. We increase the training data using data augmentation techniques to address the data scarcity common in deep learning applications. The experiments include an original training dataset of 79,684 historical handwritten images and an augmented dataset of 10,000 images containing Ethiopic texts. The model used for recognition showed promising results. For "Test Set I" which had 6,150 samples, the character error rate (CER) was 17.95%, and for "Test Set II" which had 15,935 samples, the CER was 29.95%. These outcomes indicate that this approach has the potential to improve the recognition of historical handwritten Ethiopic text.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3314334