SYSTEM AND METHODS FOR ARABIC TEXT RECOGNITION AND ARABIC CORPUS BUILDING

A method for automatically recognizing Arabic text includes building an Arabic corpus comprising Arabic text files written in different writing styles and ground truths corresponding to each of the Arabic text files, storing writing-style indices in association with the Arabic text files, digitizing...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: AL-OMARI HUSSEIN K, ALOBAID ABDULAZIZ OBAID, ALFALEH HUSSAM ABDULRAHMAN, OSFOOR MAJED IBRAHIM BIN, KHORSHEED MOHAMMAD S, ASFOUR ARWA IBRAHEM BIN
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method for automatically recognizing Arabic text includes building an Arabic corpus comprising Arabic text files written in different writing styles and ground truths corresponding to each of the Arabic text files, storing writing-style indices in association with the Arabic text files, digitizing a line of Arabic characters to form an array of pixels, dividing the line of the Arabic characters into line images, forming a text feature vector from the line images, training a Hidden Markov Model using the Arabic text files and ground truths in the Arabic corpus in accordance with the writing-style indices, and feeding the text feature vector into a Hidden Markov Model to recognize the line of Arabic characters.