A comprehensive handwritten image corpus of isolated persian/arabic characters for OCR development and evaluation
In this paper, specifications, design and implementation issues of a comprehensive corpus of capital isolated handwritten character images for Persian/Arabic languages are reported. The corpus has been designed for both OCR development and evaluation purposes. The corpus contains more than 10 millio...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, specifications, design and implementation issues of a comprehensive corpus of capital isolated handwritten character images for Persian/Arabic languages are reported. The corpus has been designed for both OCR development and evaluation purposes. The corpus contains more than 10 million characters with appropriate image quality and is supported with rich standard ground truth formatted metadata. Evaluating the accuracy of the corpus has revealed that more that 99.9% of the images are correctly labeled and the quality of more than 99.5% of images are suitable for OCR development and evaluation. This corpus may be used as a standard benchmark for OCR in Persian/Arabic OCR system. |
---|---|
DOI: | 10.1109/ISSPA.2007.4555567 |