HIMANIS Guérin

The dataset HIMANIS Guérin provides a ground-truth for HTR training (Handwritten Text Recognition) for 1217 images or part of images and 30015 lines (933 images and 22093 lines in Guérin 1; 284 images and 7922 lines in Guérin 2). It was established as part of the HIMANIS research project in collabor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Stutzmann, Dominique, Hamel, Sébastien, Kernier, Iseut de, Mühlberger, Günter, Hackl, Günter
Format: Dataset
Sprache:fre
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The dataset HIMANIS Guérin provides a ground-truth for HTR training (Handwritten Text Recognition) for 1217 images or part of images and 30015 lines (933 images and 22093 lines in Guérin 1; 284 images and 7922 lines in Guérin 2). It was established as part of the HIMANIS research project in collaboration with the READ consortium (Recognition and Enrichment of Archival Documents). The base text is the edition by Paul Guérin, Recueil des documents concernant le Poitou contenus dans les registres de la Chancellerie de France, published between 1881 and 1919. The edition was digitized and OCR processed by the Bibliothèque nationale de France, then encoded by the Ecole nationale des Chartes (http://corpus.enc.sorbonne.fr/actesroyauxdupoitou/), then corrected and enhanced in HIMANIS, esp. for abbreviations and links to digital images (https://github.com/oriflamms/himanis/blob/master/Editions/Guerin_tome1-tome12.xml). The text was aligned line by line on Transkribus by the READ consortium for the acts whose coordinates were indicated in the HIMANIS project, mainly for volumes Paris, Archives nationales, JJ 35 to JJ 91, but supplemented by information for the vol. 12 of Guérin's edition. This dataset comprises two Transkribus exports, enriched with links to images accessible via IIIF protocol in the @corresp attribute of elements. The historical corpus is described in Stutzmann, Dominique, Jean-François Moufflet, and Sébastien Hamel. « La recherche en plein texte dans les sources manuscrites médiévales : enjeux et perspectives du projet HIMANIS pour l’édition électronique ». Médiévales : Langue, textes, histoire 73 (2017): 67‑96. https://doi.org/10.4000/medievales.8198. The present dataset is the training data for the " HIMANIS Chancery M1+ " model, cf. https://readcoop.eu/model/french-and-latin-chancery-documents/
DOI:10.5281/zenodo.5535305