EB_Composite: Knowledge Graph of the First Eight Editions of the Encyclopaedia Britannica (1768-1860) Following the Heritage Textual Ontology

The EB_Composite Knowledge Graph represents information from the first eight editions of the Encyclopaedia Britannica (1768–1860), structured using the Heritage Textual Ontology (HTO). It extends our previously developed EB_HQ by incorporating multiple text sources, such as the National Library of S...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yu, Lilin, Filgueira, Rosa
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The EB_Composite Knowledge Graph represents information from the first eight editions of the Encyclopaedia Britannica (1768–1860), structured using the Heritage Textual Ontology (HTO). It extends our previously developed EB_HQ by incorporating multiple text sources, such as the National Library of Scotland and the Nineteenth-Century Knowledge Project, for each edition. A particular source is added, comprising of post-corrected textual content generated using deep-learning-based OCR error correction methods. These sources presents different levels of text quality, and EB_Composite enables researchers to compare these texts, and track how they are digitised or extracted from these sources. The EB_Composite captures 4150776 RDF triples. Same as EB_HQ, it provides structured metadata and descriptions for each edition, volume, and term. By integrating information across different editions, it enables smooth tracking of concept evolution over time. Additionally, the dataset features semantic connections to external knowledge bases such as DBpedia and Wikidata, enhancing links to modern information and supporting more comprehensive analyses. Designed to support historical research, this dataset offers rich semantic data for exploring the development of knowledge and concepts in the Encyclopaedia Britannica. It categorizes terms as either Articles or Topics, each with detailed metadata extracted from METS and ALTO XML files. OCR errors common in historical texts have been mitigated using deep-learning-based corrections.
DOI:10.5281/zenodo.13920093