A Restoration and Segmentation Unit for the Historic Persian Documents

This paper aims to provide a document restoration and segmentation algorithm for the Historic Middle Persian or Pahlavi manuscripts. The proposed algorithm uses the mathematical morphology and connected component concept to segment the line, word, and character overlapped in the Middle-age Persian d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Alirezaee, Shahpour, Fard, Alireza Shayesteh, Aghaeinia, Hassan, Faez, Karim
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper aims to provide a document restoration and segmentation algorithm for the Historic Middle Persian or Pahlavi manuscripts. The proposed algorithm uses the mathematical morphology and connected component concept to segment the line, word, and character overlapped in the Middle-age Persian documents in preparation for OCR application. To evaluate the performance of the restoration algorithm, 200 pages of the Pahlavi documents are used as experimental data in our test. Numerical results indicate that the proposed algorithm can remove the noise and destructive effects. The results also show 99.14% accuracy on the baseline detection, 97.35% accuracy on the text line extraction and removing other lines overlaps, and 99.5% accuracy for segmenting the extracted text lines to their components.
ISSN:0302-9743
1611-3349
DOI:10.1007/11558484_85