A knowledge-based recognition system for historical Mongolian documents
This paper proposes a knowledge-based system to recognize historical Mongolian documents in which the words exhibit remarkable variation and character overlapping. According to the characteristics of Mongolian word formation, the system combines a holistic scheme and a segmentation-based scheme for...
Gespeichert in:
Veröffentlicht in: | International journal on document analysis and recognition 2016-09, Vol.19 (3), p.221-235 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper proposes a knowledge-based system to recognize historical Mongolian documents in which the words exhibit remarkable variation and character overlapping. According to the characteristics of Mongolian word formation, the system combines a holistic scheme and a segmentation-based scheme for word recognition. Several types of words and isolated suffixes that cannot be segmented into glyph-units or do not require segmentation are recognized using the holistic scheme. The remaining words are recognized using the segmentation-based scheme, which is the focus of this paper. We exploit the knowledge of the glyph characteristics to segment words into glyph-units in the segmentation-based scheme. Convolutional neural networks are employed not only for word recognition in the holistic scheme, but also for glyph-unit recognition in the segmentation-based scheme. Based on the analysis of recognition errors in the segmentation-based scheme, the system is enhanced by integrating three strategies into glyph-unit recognition. These strategies involve incorporating baseline information, glyph-unit grouping, and recognizing under-segmented and over-segmented fragments. The proposed system achieves 80.86 % word accuracy on the Mongolian Kanjur test samples. |
---|---|
ISSN: | 1433-2833 1433-2825 |
DOI: | 10.1007/s10032-016-0267-1 |