Muharaf-public

Manuscripts of Handwritten Arabic dataset (Muharaf) for cursive text recognition. The following files are present in this repositoriy: public_data_files.zip: Contains the public part of Muharaf dataset. It has the images and the corresponding annotation files in JSON and XML format. public_line_ima...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Saeed, Mehreen, Chan, Adrian, Mijar, Anupam, Moukarzel, Joseph, Habchi, Georges, Younes, Carlos, Elias, Amin, Wong, Chau-Wai, Khater, Akram
Format:	Dataset
Sprache:	ara
Schlagworte:	Handwriting recognition HTR OCR Optical character recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Manuscripts of Handwritten Arabic dataset (Muharaf) for cursive text recognition. The following files are present in this repositoriy: public_data_files.zip: Contains the public part of Muharaf dataset. It has the images and the corresponding annotation files in JSON and XML format. public_line_images.zip: Contains the line images and their corresponding transcriptions. public_summary_and_keywords.zip: Contains the summary and keywords extracted from the ground truth transcriptions of each image. sfr_files.zip: Contains the preprocessed files for the start_follow_read_arabic system for training the public part of Muharaf dataset. public_1100_untrained.zip: Contains an initiailized trial folder with 3 different random splits of (train, validation, test) to reproduce the experiments reported in the paper on Muharaf-public. public_1100_trained.zip: Contains the results and models weights after training on Muharaf-public. It has results of three different random splits of (train, validation, test) sets. trial_15_untrained.zip: Contains an intialized trial folder with 3 different random splits of (train, validation, test) to reproduce the experiments reported in the paper on training all the files of Muharaf dataset (1500 training images). trial_15.zip: Contains the results and model weights after training on Muharaf. It has results of three different random splits of (train, validation, test) sets.
DOI:	10.5281/zenodo.11492214